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Abstract 


Theoretical and observational cosmology have enjoyed a number of significant 
successes over the last two decades. Cosmic microwave background measure¬ 
ments from the Wilkinson Microwave Anisotropy Probe and Planck, together 
with large-scale structure and supernova (SN) searches, have put very tight con¬ 
straints on cosmological parameters. Type la supernovae (SNIa) played a central 
role in the discovery of the accelerated expansion of the Universe, recognised 
by the Nobel Prize in Physics in 2011. 

The last decade has seen an enormous increase in the amount of high quality 
SN observations, with SN catalogues now containing hundreds of objects. This 
number is expected to increase to thousands in the next few years, as data from 
next-generation missions, such as the Dark Energy Survey and Large Synop¬ 
tic Survey Telescope become available. In order to exploit the vast amount of 
forthcoming high quality data, it is extremely important to develop robust and 
efficient statistical analysis methods to answer cosmological questions, most no¬ 
tably determining the nature of dark energy. 

To address these problems my work is based on nested-sampling approaches 
to parameter estimation and model selection and neural networks for machine¬ 
learning. Using advanced Bayesian techniques, I constrain the properties of 
dark-matter haloes along the SN lines-of-sight via their weak gravitational lens- 
ing effects, develop methods for classifying SNe photometrically from their 
lightcurves, and present results on more general issues associated with con¬ 
straining cosmological parameters and testing the consistency of different SN 
compilations. 
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Svensk sammanfattning 


Teoretisk och observationell kosmologi har atnjutit manga viktiga framgangar 
de senaste artiondena. Matningar av den kosmiska mikrovagsbakgrunden fran 
Wilkinson Microwave Anisotropy Probe and Planck, tillsammans med under- 
sbkningar av Universums storskaliga struktur och supernovor, har satt stranga 
begransningar pa de kosmologiska parametrarna. Supernovor av Typ la spelade 
en central roll i upptackten av Universums accelererade expansion, en upptackt 
som belbnades med Nobelpriset 2011. 

Det senaste artiondet har fort med sig en enorm bkning av mangden hbgk- 
valitativa observationer av supernovor, och kataloger innehaller nu hundratals 
objekt. Detta antal fbrvantas oka till tusentals inom de narmsta aren i och med 
att data fran nasta generations observationer som Dark Energy Survey och Large 
Synoptic Survey Telescope blir tillgangliga. For att kunna utnyttja den stora 
mangden kommande data ar det extremt viktigt att utveckla robusta och effektiva 
tekniker for statistisk analys for att kunna svara pa de kosmologiska fragestall- 
ningarna, framfbr allt gallande den mbrka energins beskaffenhet. 

For att angripa dessa problem ar mitt arbete baserat pa parameteruppskat- 
tning och modellval via nested sampling, samt neurala natverk for maskinin- 
larning. Med hjalp av avancerade Bayesianska metoder har jag satt granser pa 
egenskaperna bos halor av mbrk materia langs med supernovors siktlinjer via 
deras svaga gravitationella effekt, utvecklat metoder for fotometrisk klassificer- 
ing av supernovor fran deras ljuskurvor, samt arbetat med mera allmanna fragor 
associerade med bestamningen av de kosmologiska parametrarna samt undesbkt 
fbrenligheten av olika sammanstallningar av supernovor. 
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Thesis plan 

This doctoral thesis consists of two major parts: (i) a short summary of SN ob¬ 
servations, cosmological constraints, the statistical methods used for these anal¬ 
yses and my results; and (ii) the corresponding 5 articles published, submitted 
or ready to be submitted for publication. 

The first part of the thesis is further divided into seven chapters, the contents 
of which are briefly summarised below. 

* Chapter 1 motivates this work from a cosmological point of view. It gives 
a short introduction to the history of cosmology, highlights its major dis¬ 
coveries and outlines remaining significant challenges. 

* Chapter 2 gives an overview of the numerical methods used in my work: 
Bayesian methods and their use in parameter estimation and model se¬ 
lection; scanning algorithms, concentrating on MultiNest; and neural 
networks (NNs) and their applications. 

* Chapter 3 discusses SN discovery, classification and the study of progeni¬ 
tor models. It presents a historical overview of SN surveys and an outlook 
on the current state and problems that future surveys will bring. It also 
summarises the techniques for the standardisation of SNIa, and their as¬ 
sociated shortcomings. 

* Chapter 4 discusses different techniques, including the -method and 
Bayesian Hierarchical Method (BHM), for cosmological parameter infer¬ 
ence, and an assessment of their advantages and drawbacks. 

* Chapter 5 reviews gravitational leasing effects in astronomy; it gives a 
short derivation of the lens equation and shows how one can view SNe 
through gravitational telescopes. 

xiii 


* Chapter 6 summarises the main results in my papers and unpublished 
studies. 

* Chapter 7 gives a brief outlook on future challenges in SN cosmology 
and outlines, in this context, how one can develop further the methods 
presented in this work to address these issues. 
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Chapter 1 


Introduction 


‘You never know when you’ll luck 
out. Take it from one who knows.’ 

Max Frei 

In this chapter, I present a broad introduction to the standard model of cos¬ 
mology, on which the research presented in this PhD thesis is based. Rather 
than presenting a detailed analytical description, which can be found in numer¬ 
ous textbooks, I present a more qualitative, chronological account of its devel¬ 
opment, which hopefully makes this material more accessible and places into 
context how we have arrived at our current understanding of the Universe. The 
original research in this thesis is focussed on observations of SNe and the ap¬ 
plication of novel statistical methods to their analysis, and I will describe these 
topics in subsequent chapters. 

1.1 Relativistic gravitation, cosmology and the 
expanding universe 

Nearly a century has passed since Einstein published his completed theory of 
general relativity in November 1915. Within a month, Einstein discovered that 
his new theory could account precisely for a well-known “anomaly” in the orbit 
of Mercury. Moreover, in 1919, his prediction for the deflection of light from 
distant stars by the Sun was experimentally verified by Arthur Eddington, and 
Einstein became internationally famous. 

As early as 1917, Einstein realised that he had the necessary tools with which 
to derive the first fully self-consistent model of the Universe as a whole. He im¬ 
mediately faced a problem, however, in that his equations predicted the Universe 
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Vdodty-Distance Relation among Extra-Galactic Nebulae. 


Figure 1.1: Plot from Hubble (1929), which shows that the redshift of a galaxy, 
interpreted by Hubble as a speed of recession, is proportional to its distance. 


to expand or contract, which ran contrary to the prevailing belief at the time that 
the Universe was static. In order to construct a static model for the Universe, 
Einstein added to his equations an extra term that contained a new constant of 
nature called the “cosmological constant”. By carefully fine-tuning the value 
of the cosmological constant, Einstein constructed a static model of the Uni¬ 
verse. Through the mid 1920s, however, Friedmann, Lemaitre and Robertson 
all independently obtained the general solution to Einstein’s equations of gen¬ 
eral relativity for an isotropic universe, each finding that, without fine-tuning the 
value of the cosmological constant, the generic predicted behaviour was for the 
universe to expand or contract. 

Theory and observation came together in 1929, when Edwin Hubble com¬ 
bined his distance estimates to a selection of spiral galaxies with exquisite spec¬ 
troscopic studies of the galaxies made nearly 20 years earlier by Vesto Slipher. 
Such spectra may be used as a cosmic “bar-code” to identify particular atoms 
from the pattern of narrow lines in the spectrum, and also as a “radar-gun” to 
determine the velocity of the emitting material along the line-of-sight by mea¬ 
suring the Doppler shift in the wavelength of the spectral lines as compared with 
laboratory measurements on Earth. Slipher found that the “spiral nebulae” were 
made from normal matter, but also discovered that their observed spectral lines 
were all shifted significantly to longer wavelengths (towards the red end of the 
visible spectrum of light). From these so-called “redshifts” in the spectral lines*, 

'The redshift a is defined hy 1 + z = Aobs/'''em’ where Aobs is the observed wavelength of the 
spectral line and Acm is its emitted wavelength, i.e. that measured in a laboratory experiment. 
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Figure 1.2: Eight-year-old artists’ impressions of the Big Bang. Photograph 
courtesy of Pontus Bergstrom. 


he thus deduced that the galaxies were all moving away from us at considerable 
speeds. When Hubble compared the speeds of recession of these galaxies with 
the distances that he had measured to them, obtained by observing the periods 
of the Cepheid variable stars that they contained, he made the astonishing dis¬ 
covery that the speed of recession of an object is proportional to its distance, as 
shown in the Figure 1.1. This was interpreted as resulting from the Universe ex¬ 
panding uniformly in all directions. When Einstein learned of Hubble’s results, 
he is said to have described his inclusion of the additional cosmological constant 
term in the equations of general relativity as “the biggest blunder” of his life. As 
we will see later, however, posterity may judge otherwise. 

The expansion of the Universe, when combined with Einstein’s theory of 
general relativity (with the simplifying assumptions of the large-scale homo¬ 
geneity and isotropy of space) laid the foundations for the development of the 
standard Big Bang theory of cosmology (Figure 1.2), which remains to the 
present day our best description of the Universe. The idea was first proposed 
in 1932 by Georges Eemaitre, who suggested that the observed expansion of the 
Universe implied that, moving backwards in time, it must contract and would 
continue to do so until all the matter in Universe was contained in a single point, 
a “primeval atom”, which marked the origin of the spacetime fabric itself. Thus, 
running the expansion of the Universe backwards according to the laws of gen¬ 
eral relativity, and extrapolating, implies that the matter had an infinite density 
and temperature at & finite time in the past. Indeed, the presence of such a sin¬ 
gularity indicates the limit of applicability of the theory of general relativity. 
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Based on a range of eosmologieal observations, it is eurrently estimated that 
this oeeurred around 13.7 billion years ago. It is worth noting, however, that the 
Big Bang theory eannot and does not provide any explanation for the initial sin¬ 
gularity, but instead deseribes and explains the general evolution of the Universe 
sinee that instant, as it expanded from an extremely hot and dense state at very 
early times to its eool and diluted state today. 

The Big Bang theory was advoeated and developed further in the late 1940s 
and early 1950s by George Gamow, who introdueed the idea that the nuelei 
of the light elements, sueh as helium, deuterium and lithium, eould be formed 
from nuelear proeesses oeeurring in the rapidly expanding and eooling first min¬ 
utes of the Universe, following the Big Bang. His eolleagues, Ralph Alpher 
and Robert Herman, also determined the thermal history of the Universe in this 
model and predieted the existenee of the eosmie mierowave baekground (CMB) 
radiation, a near-uniform bath of thermal radiation, sometimes ealled “the after¬ 
glow of ereation”, that pervades the Universe. Alpher and Hermann ealeulated 
that, just 300,000 years after the Big Bang, the Universe would eool suffieiently 
for the ionised gas of mostly free protons and eleetrons (plus other light nuelei) 
to eombine to form neutral atoms (predominantly hydrogen), marking a sharp 
transition between an opaque eharged-partiele plasma to a transparent neutral 
gas through whieh photons ean travel unhindered, stretehing as the Universe ex¬ 
pands, until they are observed today as the CMB, a thermal blaekbody radiation 
eharaeterised by a temperature of just a few Kelvin. 

By the early 1960s, observations also revealed that the pereentage by mass 
of Helium in the Universe was around 23%. This uniformity and the faet that 
this pereentage was mueh greater than what eould be ereated in the eores of 
stars pointed to a eosmie origin, as suggested earlier by Gamow. Hoyle showed 
that sueh a pereentage of Helium was indeed predieted to be synthesised in the 
early stages of the Big Bang. Subsequent ealeulations by Fowler and Wagoner 
showed that Big Bang nueleosynthesis also produeed traees of other light ele¬ 
ments, whieh were very diffieult to form inside stars. The predieted abundanees 
matehed observations very well. The status of the Big Bang as our best the¬ 
ory for the origin and evolution of the eosmos was seeured, however, by the 
serendipitous diseovery of the CMB by Penzias and Wilson in 1964. While 
preparing the 20-foot horn-shaped antenna at Bell Laboratories to perform some 
radio-astronomieal observations, they diseovered an exeess of radiation at a tem¬ 
perature of around 3 Kelvin, wherever they pointed the teleseope in the sky. Af¬ 
ter exhaustive efforts to find the souree of this emission, whieh even ineluded 
seraping out bird droppings from the inside of the antenna, Penzias and Wilson 
realised that the signal must be the CMB predieted by Alpher and Hermann. 
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Indeed, this oldest light in the Universe has sinee proven to be a great gift to 
eosmology, sinee it provides an early-ehildhood snapshot of the Universe when 
it was just a tiny fraetion of its eurrent age. After the diseovery of the CMB, and 
espeeially when its speetrum was measured to be preeisely that of thermal radi¬ 
ation from a blaek body, most eosmologists were persuaded that some version 
of the Big Bang seenario must have taken plaee. 


1.2 Cosmic structure and dark matter 

Sinee the 1960s, most work in eosmology has been in the eontext of the Big 
Bang model, and devoted in partieular to understanding how large-seale strue- 
ture in the Universe, sueh as galaxies and elusters, form in the eontext of the 
Big Bang model. This has led to many surprises and, at times, eonsiderable 
seeptieism in the standard Big Bang theory, whieh has had to evolve eonsider- 
ably to mateh inereasingly aeeurate observations. Most notably, it has proven 
neeessary to postulate the existenee of a new form of matter that interaets only 
gravitationally with normal baryonie matter and is henee invisible: dark matter. 

The earliest evidenee suggesting the existenee of dark matter eame instead 
from observations of the distributions of veloeities of the galaxies within galaxy 
elusters. In 1933, Zwieky notieed that outer members of the Coma eluster are 
moving far too quiekly to be merely traeing the gravitational potential of the vis¬ 
ible eluster mass. In order to make the observed veloeities eonsistent with the 
virial theorem, one needed to postulate that the eluster also eontained additional 
matter, whieh eould not be seen. Observations by Babeoek in 1939 showed that 
this was also the ease on the seale of the individual galaxy Andromeda. More 
extensive observations by Rubin and Ford during the 1960s and 1970s of the 
rotation eurves of numerous edge-on spiral galaxies observed elearly showed 
that the (near) eireular veloeities of their eonstituent stars as a funetion of dis- 
tanee from the eentre of the galaxy remain approximately eonstant out to the 
observable extent of the galaetie dise. This is in stark eontrast to the expeeted 
fall-off in eireular veloeities expeeted from the visible matter eontained within 
the stars’ orbits, and suggests the presenee of a large dark matter “halo” in whieh 
the visible galaxy is embedded. 

The main argument for dark matter eomes, however, from the eentral prob¬ 
lem for the original Big Bang model to explain the formation of galaxies. As 
early as the 1930s and 1940s, Lemaitre, Tolman and Lifshitz all independently 
showed that density perturbations in an expanding universe grow quite slowly 
under their own self-gravity, with their density eontrast relative to the baek- 
ground inereasing only in proportion to the growth of the overall seale faetor of 
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Figure 1.3: Top panel present the CfA2 “Great Wall”, centered on the Coma 
cluster and to the left is one-half of the 2dFGRS, which represents real mea¬ 
surements. Bottom and right panels present simulations constructed using “the 
Millennium” simulation and uses geometries and magnitude limits matching 
corresponding surveys. Credit: Springel, Frenk, & White (2006). 


the universe. Assuming galaxies evolved from infinitesimal density fluctuations 
in the very early Universe, they inferred that galaxies could not have formed by 
the current epoch, which is clearly contrary to observations, which in the 1950s 
and 1960s began to uncover the large-scale distribution of galaxies and clusters 
in the Universe through the work of Neyman, Abell and Zwicky. Moreover, 
in the 1960s and 1970s, a number of theoretical cosmologists, including Har¬ 
rison, Zel’dovich, Peebles and Silk, had shown that to form these galaxies and 
clusters, the small density perturbations from which they evolved should leave 
imprints in the temperature distribution of the CMB across the sky. By 1980, 
however, the predicted amplitude of these temperature fluctuations exceeded ob¬ 
servational limits on anisotropies in the CMB and clearly a fundamental change 
was needed in our understanding of the formation of structure in the Universe. 
This led the theoretical cosmologist Jim Peebles to suggest that the Universe 
might be dominated by a hitherto unknown form of matter, now called dark 
matter, that interacts only very weakly with normal (baryonic) matter, of which 
we and everything we see around us is comprised. This allows for dark matter 
fluctuations to form, into which normal matter can later “fall”, without imprint¬ 
ing excess temperature variations in the CMB. 

Detailed numerical simulations showed this model to be remarkably suc¬ 
cessful in accounting for the large-scale distribution of structure in the Uni¬ 
verse (Figure 1.3). On large scales, galaxies are collected into clusters, clus¬ 
ters are part of superclusters, and superclusters are arranged into large-scale 
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sheets, filaments and voids. According to cosmological “N-body” simulations 
(e.g. Navarro et al. 1996; Springel et al. 2005; Diemand et al. 2007; Springel 
et al. 2008; Diemand et al. 2008), the formation of the observed large-scale 
structure of luminous matter could only have taken place in the presence of a 
substantial amount of dark matter. In addition, most of the dark matter has to 
be both cold and non-dissipative to enable the production of the observed struc¬ 
tures. “Cold” in this context means that it moves non-relativistically, and thus 
has a short free-streaming length (for example, smaller than the size of a gas 
cloud undergoing gravitational collapse). Being cold implies that the dark mat¬ 
ter can gravitationally aggregate on small scales and hence seed the formation of 
galaxies, as mentioned above, while being non-dissipative prevents it from cool¬ 
ing and collapsing with the luminous matter and overproducing galactic discs. 

Most importantly, when the Cosmic Background Explorer satellite detected 
anisotropies in the CMB for the first time in 1992, they were found to be at a 
level of about one part in 100,000 relative to the 2.73 Kelvin background, which 
is consistent with structure formation in the cold dark matter (CDM) scenario. 
During the following decade, the CMB anisotropies were measured with in¬ 
creasing accuracy by a large number of ground-based and balloon-borne exper¬ 
iments, by the Wilkinson Microwave Anisotropy Probe (WMAP) satellite over 
the period 2002-2009, and most recently by the Planck satellite, which com¬ 
pleted its observations in 2013. All these observations remain consistent with 
the CDM model. Moreover, the power in the CMB anisotropies measured on 
different angular scales is consistent with the observed large-scale correlations 
in the distribution of galaxies. This observation of Baryon Acoustic Oscillations 
(Albrecht et al. 2006) strongly supports the idea that cosmic structure formed 
from the passive gravitational collapse of primordial density perturbations, the 
imprints of which we see in the CMB. 

Indeed, CMB observations can be combined with independent measure¬ 
ments of Db,o (the present-day baryonic matter density) and Dm,o (the present- 
day total matter density) from Big Bang nucleosynthesis and large-scale struc¬ 
ture observations, respectively, to provide very strong evidence for the existence 
of dark matter. Using the latest Planck results, one obtains posterior mean val¬ 
ues and 68 per cent credible intervals of Dmo = 0.314 zb 0.020, Qb.oh'^ = 
0.02207 zb 0.00033 , Ddm,o^^ = 0.1196 zb 0.0031, indicating that dark matter 
must be predominantly non-baryonic.^ 


^The present-day density parameter for the ith component is defined as flio = 
8TTGpifi/{3HQ), where pifi is its physical density and Ho = lOO/i is the present-day value 
of the Hubble parameter, which is estimated to be Ho = 67.3 ±1.2 from combined cosmological 
probes (Planck Collaboration et al. 2013). 
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1.3 Gravitational lensing 

A striking visual representation of the presence of dark matter and a beautiful 
illustration of general relativity is provided by gravitational lensing. General 
relativity predicts that light rays are bent around massive bodies or, more gen¬ 
erally, undergo deflections when they traverse a region in which the gravita¬ 
tional held is inhomogeneous. In this manner, light from background objects 
can be “lensed” by massive objects in the foreground as the path of light pass¬ 
ing through the gravitational held of foreground object is bent. The deflection 
of light is not just a relativistic effect, but is also predicted by Newton’s theory 
of gravitation, when one considers light to be made up of a stream of particles. 
The Newtonian approach does, however, predict only one-half of the deflection 
predicted by general relativity. In the latter theory, a light beam that just grazes 
the surface of the Sun suffers a deflection of 1.75 arcseconds, whereas New¬ 
ton’s theory predicts just 0.87 arcseconds. Indeed, as I mentioned previously, 
the observational confirmation of the larger value by Eddington in 1919 was a 
key factor in leading the scientific community to accept Einstein’s description 
of gravity in terms of general relativity. 

Very massive astronomical objects, which lie at large distances from the 
Earth, can exert such a strong gravitational effect on light rays that pass near 
them that a single background source can to observed as multiples images. This 
phenomenon was predicted very early on in the study of gravitational lensing, 
but was only observed for the first time in 1979 (Walsh et al. 1979), since when 
gravitational lensing has become a major area of research in astrophysics. Sys¬ 
tems which have been observed to contain multiple images are many tens in 
number. Of greatest interest to astrophysics is that the analysis of the distri¬ 
bution and shape of these multiple images can be used to derive an accurate 
estimate of the mass distribution in the lensing object. In particular, when light 
from distant galaxies is lensed by a cluster, one can see evidence of significant 
gravitational lensing, far more than can result from the observed distribution 
of luminous matter in the foreground cluster, thereby implying the presence of 
dark matter (e.g. Tyson et al. 1998; Massey et al. 2007). Eigure 1.4 shows an 
example of such a cluster. 

In most cases, however, the effects of gravitational lensing are far more sub¬ 
tle. Typically, the size and shape of background objects are only very slightly 
changed. The nature of this change is very difficult to determine for individual 
objects, since one does not know a priori its unlensed intensity distribution. One 
therefore has to average the effect of a large number of background objects to 
obtain a statistical measure of this weak-lensing signal. Indeed, such observa- 
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Figure 1.4: Presented in the left panel is an image of a gravitationally lensed 
cluster, while on the right is the mass map of the foreground cluster. Credit: 
Greg Kochanski, Ian Dell’Antonio, and Tony Tyson (Bell Labs, Lucent Tech¬ 
nologies). 


tions can be used to investigate the nature of cosmic structures, provided one 
has a large collection of lensed background objects to analyse. 


1.4 SNe, universal acceleration and dark energy 

In the last two decades, there has been an unexpected twist in the story of cos¬ 
mology. In 1998, measurements of the redshift-magnitude relation for SNla, 
which can be used as “standard candles” in cosmology, indicated that, when the 
Universe was around half of its present age, its expansion underwent a transi¬ 
tion from a decelerating phase into an accelerating one, which continues to the 
current epoch. This came as a complete surprise, as it was thought that the ex¬ 
pansion should decelerate as a result of the attractive gravitational force between 
all objects slowing down the expansion. To explain an accelerating universe, one 
has to posit some additional component of the universe, known generically as 
“dark energy”, which has a large negative pressure and thus leads to a gravita¬ 
tional repulsion. Amazingly, the simplest form for such a component is provided 
precisely by the additional cosmological constant term (or A-term) that Einstein 
included in his equations of general relativity when trying to build a static uni¬ 
verse model, but then rejected as his “biggest blunder” when he learned that 
the Universe is expanding. The resulting “ACDM” scenario is our best current 
cosmological model, which describes all existing observations. Indeed, results 
from WMAP, Planck and other CMB observations, combined with galaxy sur¬ 
veys of large-scale structure are all consistent with a ACDM model, known as 
the “concordance cosmology”, in which the total mass/energy density budget 
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Figure 1.5: 68% and 95% confidence contours for the present-day cosmological 
density parameters flmp and flA.o- Labels for the various data-sets correspond to 
the Betoule et al. (2014) SNIa compilation (JLA), the Conley et al. (2011) SNIa 
compilation (Cll), the combination of Planck temperature and WMAP polar¬ 
ization measurements of the CMB fluctuation (PLANCK-i-WP), and a combina¬ 
tion of measurements of the baryon acoustic oscillations scale (BAO). The black 
dashed line corresponds to a flat universe. Credit: Betoule et al. (2014). 


of the Universe at the present time is comprised of approximately: 73% dark 
energy, 23% dark matter and 4% ordinary matter (see Figure 1.5), and in which 
structure forms from the passive gravitational evolution of scale-invariant per¬ 
turbations generated (somehow) in the very early Universe. 


1.5 Inflation, uncertainty and the future 

There remain numerous open questions in cosmology. Indeed, many cosmol- 
ogists view the current standard model of cosmology with considerable scepti¬ 
cism. In addition to the unknown physical nature of both dark matter and dark 
energy, which supposedly dominate our Universe, our understanding of funda¬ 
mental physics is only sufficient to project back to around one ten-billionth of a 
second after the Big Bang, at which epoch the typical densities and energies of 
particles are at the limit of what can be reached in the latest particle physics ex¬ 
periments, such as the Large Hadron Collider (e.g. The ATLAS Collaboration: 
G. Aad et al. (2008); LHC Higgs Cross Section Working Group et al. (201 1)). 
At earlier times, the physics of the Big Bang is subject to considerable spec¬ 
ulation and doubt. The most popular current model, known as the theory of 
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inflation, proposes that, almost immediately following the Big Bang, the Uni¬ 
verse underwent a short period of exponential expansion, growing in size by a 
factor of 10^^ in just 10“^^ seconds, during which microscopic quantum fluc¬ 
tuations in the matter fields at the time were stretched to macroscopic scales to 
generate the seeds of structure formation. Indeed, this mechanism for the origin 
for all the structure in the Universe was proposed by Guth, Linde and Starobin- 
sky, amongst others, in the early 1980s. The inflationary model also solves a 
number of other problems, such as explaining why the Universe is so homoge¬ 
neous and isotropic on the largest scales. Very recent support for the inflationary 
paradigm has potentially been provided by the B1CEP2 experiment (Ade et al. 
2014; B1CEP2 Collaboration et al. 2014), which claims to have observed po¬ 
larised emission from CMB anisotropies that is consistent with the presence of 
primordial gravitational waves, which are also predicted to be produced (almost 
exclusively) by inflation. There are currently, however, some concerns regarding 
the interpretation of the B1CEP2 results, and further experimental verification is 
required. 

Although an attractive proposal, inflation does, however, have some theo¬ 
retical problems of its own. In particular, determining the initial conditions for 
inflation is both conceptually and technically very demanding, and it may be 
the case that producing a period of inflation that is consistent with observations 
requires an unacceptable level of fine-tuning in the theory, but this is far from 
certain. Thus, cosmology now finds itself again in a period rich in alternative 
models, the development of which is driven by scepticism in our existing de¬ 
scription of the Universe. Only time will tell whether (another) revolution in our 
thinking is required, but, based on the experience of the last one hundred years, 
it seems very likely. 
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Chapter 2 


Statistical methods 


‘There are three kinds of lies: lies, 
damned lies, and statistics.’ 

Benjamin Disraeli 

There are two ways to define probability. One of them is Frequentist, which 
postulates that probability is “the ratio of the times the event occurs in a test 
series to the total number of trials in the series” (D’ Agostini 1995), or the “fre¬ 
quencies of outcomes in random experiments” (Mackay 2003), and the other is 
Bayesian, which postulates that probability is “a measure of the degree of be¬ 
lief that an event will occur” (D’Agostini 1995). In my research, I choose to 
follow the Bayesian interpretation of probability, since it provides the only self- 
consistent extension of Boolean algebra to propositions that are not simply true 
or false, but are associated with a degree of belief (defined to lie between 0 and 
1) (Cox 1946). 

On a more practical note, the recent development of efficient sampling al¬ 
gorithms makes the implementation of Bayesian methodology more straightfor¬ 
ward and reliable than its Frequentist counterpart. The central reason for this lies 
in what Bayesians and Frequentists consider to be the most important properties 
of probability distributions. In contrasting the two statistical schools, much at¬ 
tention is usually paid to the issue of priors. It is indeed true that Bayesians 
consider the posterior probability, namely the product of the likelihood and the 
prior, as the primary distribution for inference. By contrast, the Frequentists 
tend to eschew the notion of priors and concentrate on the likelihood alone. 
This difference is, however, often overstated and leads practically to very lit¬ 
tle difference in the final inferences. A far more profound practical difference 
between the schools is that Bayesians consider probability mass as most impor¬ 
tant, whereas Frequentists consider the point-value of the probability as primary. 
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Thus, given a probability distribution (either the posterior or the likelihood), a 
Bayesian integrates under the distribution to identify regions of the parameter 
space containing the largest integrated probability, whereas the Frequentist aims 
to identify the point(s) in the parameter space with the largest probability value, 
at least when adopting the most commonly-used maximum-likelihood or max¬ 
imum a posteriori estimators. Using a thermodynamic analogy, Bayesians fo¬ 
cus on heat whereas Frequentists concentrate on temperature. This difference 
between the schools is far more fundamental than the use of priors, and has im¬ 
mediate practical consequences. By obtaining Monte Carlo samples from the 
distribution, the construction of marginal probability distributions, as preferred 
by the Bayesian, is far more easily and reliably achieved than the construction of 
profile likelihoods, as preferred by the Frequentist. This provides another, very 
practical reason for choosing the Bayesian approach. 


2.1 Bayesian statistics 


In general, the probability Pr(^|i?) is the degree of belief, given B, of the truth 
of A. Bayes’ theorem can be used to change the order of the conditioning. 


Pj:{A\B) 


Pr{B\A) Pr(A) 
Pv{B) 


(2.1) 


In many cases, one wants to use a set of observations or data to infer values of 
parameters within a given model. Given a model or hypotheses H with a set 
of N free parameters 0 = {0i}, together with a data-set D, Bayes’ theorem 
implies that one can write 


Pr(0|D,ii') 


Pr(D|0,iT) Pr(0|iT) 
■ Pr(D|iF) 


( 2 . 2 ) 


where Pr(0|D,Ff) is the posterior probability (density) of the parameters 0, 
7r(0) = Pr(0|Ff) is the prior probability (density), £(0) = Pr(D|0,Ff) 
is the probability (density) of the data D, for assumed parameter values 0 
(called the likelihood when considered as a function of 0), while Pr(D|Ff) 
is the Bayesian evidence. 


Z = Pr(D|iF) 


Pr(D|0,F) Pr(0|Fr)d^0 


£(0)7r(0)d^0.(2.3) 


Note that this makes the posterior in Eq. 2.2 normalised to unity over the space 
of parameters. 
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In parameter estimation, the complete Bayesian inference is embodied in 
the posterior distribution of the parameter values. This may be used to obtain 
joint constraints on all parameters simultaneously or constraints on individual 
parameters by the process of integrating out (or marginalising over) all the other 
parameters. In practice, the posterior distribution is explored by drawing sam¬ 
ples from it using (most often) standard Markov chain Monte Carlo (MCMC) 
sampling techniques. Once they have reached equilibrium, such methods pro¬ 
duce a set of samples whose density is proportional to the posterior. In the entire 
process, one need not calculate the evidence to normalise the posterior, since the 
evidence does not depend on the parameters. 

By contrast, the evidence is the key quantity of interest for the problem of 
model selection. Since the evidence may be considered as the average of the 
likelihood over the prior, it provides a natural means of applying Occam’s razor. 
If a model has a highly-peaked likelihood, but there exist large regions of the 
parameter space that are disfavoured, since the likelihood is low there, then the 
evidence of the model will be small. Large evidence values occur for models for 
which a large fraction of the allowed parameter space is likely. Thus, one can 
decide which of two models Hq and Hi is preferred by the data D by calculating 
the ratio of posterior probabilities 

Pr(gi|D) _ Pr(D|gi)Pr(gi) _ Pr(gi) 

Pr(77o|D) Pr(D|Fo)Pr(77o) Pr(77o) ’ ^ 

where Pr(i7i)/ Pr(i7o) is the a priori probability ratio for the two models. In 
most problems this prior ratio is set to unity, but there are cases, most notably in 
object detection, where one must set this prior ratio quite carefully to offset the 
“look elsewhere” effect. One uses Jeffreys’ scale given in Table 2.1 to interpret 
the ratio in Eq. 2.4.' In practice, the evidence may also be evaluated using 
MCMC sampling methods, although the standard technique of thermodynamic 
integration (O Ruanaidh & Fitzgerald 1996) typically requires about an order of 
magnitude more samples than needed for parameter estimation. 

2.1.1 Check for inconsistency between data-sets: 
the 7?.-test 

A useful application of Bayesian model selection is in determining whether dif¬ 
ferent data-sets are mutually consistent. In principle, one should always check 
that this is the case before performing a joint analysis using them, although 

'Throughout this work, log® (i.e. without any subscript) denotes the natural logarithm of x, 
which is also commonly denoted by In x. 
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in practice such a check is not often undertaken. Indeed, the vast majority of 
analyses in SN cosmology do not test whether different surveys are mutually 
consistent before combining them in a joint analysis to determine cosmological 
parameters. 

Adopting a Bayesian model selection approach, we denote by Hq the (null) 
hypothesis that the data-sets are mutually consistent. In this case, one would 
expect each data-set to prefer broadly the same region(s) of the model parameter 
space. Under the (alternative) hypothesis Hi that the data-sets are mutually 
inconsistent, one or more of them favour a different region (or regions) of the 
parameter space. Simply performing a joint in this case could lead to erroneous 
results (see, for example. Appendix A in Feroz et al. 2008 for a demonstration). 

In order to determine which one of these hypotheses is favoured by the data, 
one can perform Bayesian model selection between Hq and Hi. Using Eq. 2.4 
and assuming that hypotheses Hq and Hi are equally likely a priori, this can be 
achieved by calculating 

^ Pr(D|go) ^ Pr(D|go) 

Pt{-D\Hi) aPr(Al^i)’ ^ ^ 

Here the numerator represents the standard joint analysis of all the data-sets 
D = {Di, D 2 , ■ ■ ■ , Dn}, whereas the denominator in the final expression rep¬ 
resents the case in which each data-set is analysed separately. It is worth noting, 
however, that the second equality in Eq. 2.5 is valid only when one allows for 
potential inconsistencies in the preferred values of the. full set of model param¬ 
eters. Nevertheless, there are often situations where one is interested only in 
potential inconsistencies in the preferred values of some subset of the model 
parameters. In such cases, one must use only the first equality in Eq. 2.5 and 
calculate the denominator Pr(D|iFi) by performing a joint analysis of D in 
which each data-set is assigned its own “private copy” of only those parameters 
in the subset of interest. It is clear from Eq. 2.5 that an 7^-value larger than unity 
(or, equivalently, a positive log 7^-value) indicates that the (null) hypothesis Hq, 
that all the data-sets are mutually consistent, is favoured. Otherwise, the (alter¬ 
native) hypothesis Hi is preferred, indicating some inconsistency between the 
data-sets. One uses Jeffreys’ scale given in Table 2.1 to interpret the value of TZ. 

2.1.2 Analysis of potentially inconsistent data-sets: 
hyper-parameters 

One method for accommodating potentially inconsistent data-sets in a joint anal¬ 
ysis is to introduce hyper-parameters that effectively assign a weight to each 
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log(odds) 

odds 

Pr(ffi|D) 

Interpretation 

< 1.0 

<3:1 

<0.75 

Inconclusive 

1.0 

~ 3 : 1 

~ 0.75 

Weak evidence 

2.5 

~ 12 : 1 

~ 0.92 

Moderate evidence 

5.0 

~ 150 : 1 

~ 0.993 

Strong evidence 


Table 2.1: Jeffreys’ scale for the interpretation of Bayes factors and model prob¬ 
abilities. The posterior model probabilities for the preferred model are calcu¬ 
lated by assuming only two competing hypotheses. 


data-set that is determined directly by it own statistical properties (Hobson, Bri¬ 
dle, & Lahav 2002). The space of hyper-parameter weights is explored simulta¬ 
neously with the space of original model parameters to obtain a joint posterior 
distribution. By marginalising over the original model parameters, one obtains 
the posterior distribution of the hyper-parameter weights, which may be used to 
determine if any inconsistencies exist between different data-sets. Conversely, 
one can instead marginalise over the hyper-parameters to recover the posterior 
distribution as a function only of the original model parameters. Moreover, cal¬ 
culation of the Bayesian evidence for the data, with and without the introduction 
of hyper-parameters (which we denote by the hypotheses Hi and Hq, respec¬ 
tively), allows us to perform model comparison to determine whether the data 
warrant the introduction of weights into the analysis. 

When analysing multiple data-sets jointly, inferred values of hyper-parame¬ 
ters which depart significantly from unity indicate the presence of some incon¬ 
sistency. Even in such cases, however, the hyper-parameter approach allows for 
a robust joint analysis of the data-sets. In particular, marginalisation over the 
hyper-parameters allows for the resulting posterior distribution of the original 
model parameters to broaden or even exhibit multi-modality resulting from the 
preference of different data-sets for different regions of the model parameter 
space (see Hobson, Bridle, & Lahav 2002 for more details). 


2.2 Nested sampling and the MultiNest 
algorithm 

It is computationally very demanding to evaluate the multidimensional integral 
in Eq. 2.3. Nested sampling is a Monte Carlo approach, introduced by Skilling 
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Figure 2.1: Left panel: An example of a two-dimensional posterior distribution. 
Right panel: The function C{X). The prior volumes Xi are associated with each 
likelihood Credit: Feroz & Hobson (2008). 


(2004), that is designed to calculate the evidence efficiently, and which also pro¬ 
duces posterior inferences as a by-product. The method has been extended by 
Feroz & Hobson (2008) and Feroz et al. (2009), who introduced the MULTI- 
Nest algorithm, which is able to accommodate posteriors with multiple modes 
and/or large (curving) degeneracies. In the following description of the algo¬ 
rithm, I will closely follow the discussion given in these two papers. 

The key innovation in nested sampling is that the multi-dimensional evi¬ 
dence integral is transformed into a one-dimensional integral. To perform this 
transformation one first defines the prior volume X via the differential relation¬ 
ship dA = 7r(0)d^0. Thus one can write X as 

A(A) = [ 7r(0)d^0, (2.6) 

where the domain of integration comprises the region(s) of the parameter space 
that lies within the iso-likelihood contour C{&) = A. Thus, one can write the 
evidence integral, Eq. 2.3, as: 



(2.7) 


where C{X), which is the inverse of Eq. 2.6, is a monotonically decreasing 
function of X. Thus, the evidence can evaluated using one-dimensional numer¬ 
ical quadrature integration methods, provided one can evaluate the likelihoods 
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Ci = C{Xi), where Xi is a sequence of decreasing values, 


0 < Xm < • • • < X 2 < X\ < Xq — 1, (2.8) 

as shown schematically in Figure 2.1. In particular, one can write the evidence 
as the weighted sum 

M 

Z = Y^CiWi, (2.9) 

i=l 

where the weights Wi for the simple trapezium rule are given by 

Wi = -(2fj_i — Xj+i). (2.10) 

Figure 2.1 gives an illustration of this process for a two-dimensional posterior. 

To perform the sum in Eq. 2.9 one begins at iteration z = 0 by drawing N 
samples from the prior distribution 7r(0). These samples constitute the set of 
so-called “live” or “active” points in the nested sampling process. At this initial 
stage the volume of the prior is unity, namely Xq = 1. One then calculates the 
likelihood of each of the active points and determines which of them has the low¬ 
est value (which 1 denote by £ 0 ); this point is then discarded from the active set. 
In order to maintain the number of active points, one then replaces the discarded 
point with a new point that is again drawn from the prior, but is now required 
to lie within the iso-likelihood contour C = Cq. The prior volume contained 
within this contour is not known precisely, since it depends on the points in the 
original active set. Nonetheless, one may show that the ratio t = Xi/X q of the 
new and original prior volumes is distributed as Pr(f) = As the process 

continues, the iterative discarding and replacement of the point with the lowest 
likelihood results in the iso-likelihood contour shrinking and the live points be¬ 
ing constrained to ever smaller prior volumes and higher likelihood regions. One 
may show that after i iterations the prior volume is Xi ss exp(— i/A^). The pro¬ 
cess is usually concluded by imposing some criterion on the accuracy to which 
the evidence has been calculated. 

Although nested sampling is designed primarily to calculate the evidence, a 
happy consequence of the method is that the final set of active points, together 
with the “historic” set of discarded points produced during the iterations, can be 
used to obtain posterior inferences on the parameters. Indeed, one may show 
that posterior-weighted samples are obtained by assigning each point the weight 

CiWi 

Z 


Pi = 


( 2 . 11 ) 
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Figure 2.2: The result of applying the ellipsoidal decomposition algorithm in 
MultiNest to a set of 1000 points sampled from: two non-intersecting ellip¬ 
soids (left panel); and a torus (right panel). Credit: Feroz & Hobson (2008). 


Once can then use these samples in the same way as samples obtained from a 
standard MCMC method to calculate parameter means, standard deviations, and 
covariances, or even to construct their marginalised posterior distributions. 

Nested sampling has been described as only a “meta-algorithm”, since it 
leaves unanswered the key question of how, at each iteration i, to draw the re¬ 
quired replacement point from the prior within the iso-likelihood contour C = 

The MultiNest algorithm (Feroz & Hobson 2008; Feroz et al. 2009) per¬ 
forms this task using rejection sampling from a multi-ellipsoidal bound tailored 
to the current active point set. At each iteration, one performs an expectation- 
maximisation process to determine the set of (possibly overlapping) ellipsoids 
that encloses the set of N live points in the minimum volume, subject to the 
lower limit of the expected prior volume Xi = exp(—i/A^). The new replace¬ 
ment point is then drawn uniformly from the region enclosed by these ellipsoids. 

This ellipsoidal decomposition is very flexible and is able to accommodate 
both multimodal structure and degeneracy lines in the target posterior distri¬ 
bution. In particular, for posteriors that contain well-defined and well-separated 
modes, the ellipsoidal decomposition allows one to identify the modes and evolve 
the nested sampling process in each mode separately. In essence, modes are 
identified as separate entities if there exist ellipsoid(s) set that do not overlap 
with any others. An illustration of the ellipsoidal decomposition performed by 
MultiNest is given in Figure 2.2. More recently, further developments of 
MultiNest has been made to enable even more accurate evaluation of the ev¬ 
idence Feroz et al. (2013). 
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Figure 2.3: A 3-layer NN with 3 inputs, 4 hidden nodes, and 2 outputs. Image 
eourtesy of Wikimedia Commons. 


2.3 Machine-learning and neural networks 

In addition to performing parameter estimation and model selection using Bayes¬ 
ian methods, in this thesis I also use machine-learning techniques, particularly 
for the photometric classification of SNe into their respective types (see Sec¬ 
tion 6.3 and Papers III and V). From the numerous approaches to machine¬ 
learning, I use neural networks (NNs). In particular, I employ the SkyNet 
package, which is a generic NN training algorithm (Graff et al. 2013, 2014). 

An artificial NN is a mafhemafical model loosely based on fhe sfrucfure of 
fhe brain. A NN consisfs of groups of nodes fhaf are connecfed fo one anofher 
by direcfional links fhaf are assigned particular weighfs. Using fhese links, each 
node processes informafion if receives and fhen passes fhe resulf fo ofher nodes. 
A greaf shorf infroducfion fo NNs is given by Mackay (2003). 

In fhis fhesis, I focus entirely on fhe simplesf form of NNs, which are known 
as feed-forward nefworks. In particular, I will consider only 3-layer nefworks, 
which consisf of a layer of inpuf nodes, connecfed fo a “hidden” layer, which 
ifself is fhen connecfed fo an oufpuf layer (see Figure 2.3). 

Each node (or perception) in fhe nefwork maps an inpuf vecfor x G fo a 
scalar oufpuf /(x; w, 0) given by 


f{x;w,0)=0 + '^WiXi, ( 2 . 12 ) 

i=l 

where {tUi} and 9 are, respecfively, fhe “weighfs” and “bias” of fhe perception. 
Thus, for a 3-layer NN, fhe oufputs of fhe nodes in fhe hidden and oufpuf layers 
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are (Graff et al. 2012) 


hidden layer: hj = ff’ = ^ wf,^xu (2.13) 


output layer: pi = ^); /f ^ ^ w^fhj, (2.14) 


where the indices I, j and i indicate, respectively, input, hidden and output 
nodes, and and g^'^'i are called activation functions. It may be shown that, 
for the NN to operate correctly, these functions must obey certain requirements, 
namely that they are smooth, monotonic and bounded. I follow Graff et al. 
(2012) and use g^^'> (x) = tanh(x) and {x) = x. 

When one “trains” a NN, one determines the values of the weights and bi¬ 
asses that optimise the accuracy of the mapping from the input nodes to the 
output nodes. The existence of a suitable mapping is guaranteed by the “univer¬ 
sal approximation theorem” (Homik et al. 1990). As one increases the number 
of hidden nodes, the accuracy of the mapping typically increases, but so does 
the possibility of overfitting the training data. Nonetheless, the ability of the 
network to learn complicated mappings may be compromised if the number of 
hidden nodes is too low. There is therefore a balance between these competing 
factors and the optimal number of hidden nodes is best determined by comparing 
the fitting error and correlations of NNs with different numbers of such nodes 
trained on the same data. 

In training a NN, we wish to find fhe optimal sef of nefwork weighfs and 
biasses (which fogefher we call fhe nefwork paramefers a) fhaf maximise fhe 
accuracy of the predicted outputs. However, one must be careful to avoid over¬ 
fitting to the training data at the expense of making predictions for input values 
the network has not been trained on. The general procedure for training a NN is 
to present it with a set of input and outputs (or targets) V = Typ¬ 

ically around 75% of the set should be used for actual NN training, while the 
remainder is used as a validation set of data to determine convergence and avoid 
overfitting. 

To train the network, one optimises the probability of reproducing the known 
training data outputs with respect to the network parameters. For problems of 
regression (fitting the model to a function), this yields a log-likelihood for a in 
the form of a standard misfit function, given by 



(2.15) 
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where rit is the number of training data, riout is the number of network outputs, 
and a) are the NN’s predieted outputs for the input vector and net¬ 

work parameters a. The values aj are hyper-parameters of the NN model that 
describe the standard deviation of each of the outputs. 

For a classification network that aims to learn the probabilities that a set of 
inputs belongs to a set of output classes, the outputs of the network are softmaxed 
to become probabilities. 


Pj 


eVj 


(2.16) 


The classification likelihood is then given by the cross-entropy function 


nt ric 

^(a) = ^ J]]fj*^logpj(xW;a). (2.17) 

i=l j=l 


In this scenario, the true and predicted output values are probabilities. In the 
true outputs, all are zero except for the correct output class, which has a value 
of one. 
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SNe: from sky to catalogue 


‘When he shall die. 

Take him and cut him out in little 
stars. 

And he will make the face of 
heaven so fine 

That all the world will be in love 
with night 

And pay no worship to the garish 
sun.’ 

William Shakespeare 

Observations of SNe have been recorded since ancient times, transient ob¬ 
jects which flare brightly and appear as “new stars” against the unchanging back¬ 
ground static stars, only to fade after about a month or so. The term “nova” was 
first used by Tycho Brahe (1546-1601) to describe a “new star” which appeared 
in the constellation of Cassiopeia on 11th November 1572, which he observed 
from Herrevads kloster, Sweden. Much later Fritz Zwicky and Walter Baade dif¬ 
ferentiated between “two well-defined types of new stars or novae which might 
be distinguished as common novae and super-novae” (Baade & Zwicky 1934; 
Zwicky 1940). Nevertheless, only since the late 1990s, in the era of charge- 
coupled device (CCD) telescopes, have SNe made their contribution to cosmol¬ 
ogy. After the great success of the Calan-Tololo survey, it became clear that one 
can successfully standardise multi-band SN data, based on their reproducible lu¬ 
minosities. The story of realisable “standard cosmological candles” had begun. 

In this chapter, I discuss SN discovery, classification and the study of pro¬ 
genitor models. I give a short historical overview of past SN surveys, summarise 
the current state of the field and give an outlook on future surveys. I also sum¬ 
marise the techniques for standardising SNIa, and their associated shortcomings. 
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Wavelength (nm) 

Figure 3.1: Shapes of the commonly used photometric bands in the visible light, 
arbitrarily normalised. U, B, V, R, I refer to standard bands from Bessell (1990), 
and u, g, r, i, z to the SDSS bands from Fukugita et al. (1996). Credit: Astier 
( 2012 ). 


3.1 Observational techniques 

Since most of the energy output of a SN is in visible light, the main meth¬ 
ods for collecting information about them is through photometric imaging and 
spectroscopy. Most of the studied SNe are usually observed using both these 
techniques, so it is very useful to remember the rough rule-of-thumb: if the SN 
has been detected/measured with a telescope of diameter D, then spectroscopic 
follow-up will typically require a telescope with diameter 2D. 


3.1.1 Photometry 

In Figure 3.1, one can see two different sets of filters commonly used in SN 
imaging. UBVRI is a “standard” set (Bessell 1990), and u, g, r, i, z is a set of 
specially designed filters used by the Sloan Digital Sky Survey (SDSS, Fukugita 
et al. 1996). Nowadays, imaging in the visible range is made with a silicon CCD 
with red cutoff ~ 1.1pm and 10® pixels per device. 

A standard imaging observation is a two step process: (i) several minutes of 
integration, (ii) ~ 1 minute of read out. The image resolution for ground-based 
telescopes is below 1 arcsecond full-width-half-maximum and almost one order 
of magnitude finer in space. 
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3.1.2 Spectroscopy 

For spectroscopic observations, one most frequently uses fiber optics (Gunn. 
Siegmund, & Mannery 2006) or systems of lenses (de Zeeuw et al. 2000), which 
are positioned as a narrow slit in the image plane and disperse the light in the per¬ 
pendicular direction. Modern multi-object spectroscopy instruments can collect 
data from up to 1000 objects simultaneously. As in photometric observations, 
putting spectroscopic instruments in space can be very advantageous, since one 
can use a slit-less spectrometer down to very low sky brightnesses. 

3.1.3 Near-infrared observations 

Very promising observations of SNe in the near infrared have been made using 
low band-gap pixelised semiconductor devices, coupled to integrated readout 
electronics (Hodapp et al. 1996). Making this kind of observation from the 
ground has problems since: (i) the atmospheric glow rises with wavelength; (ii) 
there are numerous absorption lines; and (iii) both these effects are not constant 
in time. Near infrared observations also have a problem with ionizing radiation 
affecting the sensors. 

3.2 Classification 

When describing the use of SNe for cosmological parameter inference one usu¬ 
ally means SNIa. However, SNe occur in a very broad range of classes, which 
is still continuing to expand as SNe with previously unseen properties are dis¬ 
covered. 

The first of Baade and Zwicky’s SNe had the broad features characteristic 
of fast moving ejecta, being about 100 times brighter than regular novae and 
not showing evidence of hydrogen lines. This was a detection of a “type I” SN. 
Already in 1941, a different class of SNe had been proposed (Minkowski 1941). 
These “type IF’ SNe were fainter than those originally discovered and had a hy¬ 
drogen line in their spectra. In 1985 another unusual SN was detected, this time 
it was characterised by the absence both of a hydrogen and silicon line. This 
prompted a separation into subtypes within “type I” SNe: SNIa were defined 
as evenfs fhaf have silicon and no hydrogen; fype Ib display neifher silicon nor 
hydrogen, buf have a sfrong helium line; and fype Ic have none of fhese lines. 
Figure 3.2 shows fhe classificafion scheme for fhe main SN fypes. There is a 
frend fo allocate each somewhaf unusual SN evenf fo a new pigeonhole. This 
queerness can be a small variation in fhe specfra or lighfcurves. This should in 
mosf cases be resisfed, excepf if fhere are clear physical grounds for doing so. 
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$N ( ^N H-l) 


Figure 3.2: SN classification scheme. 


Nonetheless, this is sometimes the case. For example, recently a new class of 
superluminous SNe (SLSNe) have come to light. These SNe are tens to hun¬ 
dreds of times more luminous than “ordinary” SNe, and themselves divide into 
three subclasses: SLSN-R, SLSN-I and SLSN-II, according to Gal-Yam (2012). 

Since SNIa are the main focus of this work, I point out another way of dis¬ 
tinguishing between SNIa and non-la SNe. SNIa are thermonuclear explosions 
(see Section 3.4.1), while all the other types are core-collapse explosions. 

As mentioned above, the silicon line is a signature of a “standard candle”, 
which means that in order to distinguish these events from the variety of SN 
explosions we do need to have spectroscopic measurements. From Figure 3.2 
one can see that, indeed, most of the classification is performed on the basis of 
the presence/absence of some element’s (spectroscopic) lines, but, by contrast, 
separation within type II is based on the shape of SN lightcurves. Since spectro¬ 
scopic observations are very “expensive” it would be of great benefit to devise a 
SN classification method based purely on photometric data (from which one, of 
course, cannot identify the Si absorption line). Many techniques targeted at SN 
photometric classification have been developed, mostly based on some form of 
template fitting (Poznanski et al. 2002; Johnson & Crotts 2006; Sullivan et al. 
2006; Poznanski et al. 2007; Kuznetsova & Connolly 2007; Kunz et al. 2007; 
Rodney & Tonry 2009; Gong et al. 2010; Falck et al. 2010). In such methods, 
the lightcurves in different filters for the SN under consideration are compared 
with those from SNe whose types are well established. Usually, composite tem¬ 
plates are constructed for each class, using high signal-to-noise observations 
of lightcurves of well-studied SNe (see Nugent et al. 2002), or spectral energy 
distribution models of SNe. Such methods can produce good results, but the 
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final classification rates are very sensitive to the characteristics of the templates 
used. One of the best-known approaches of this type is PSNID (Sako et at. 2008, 
201 1), which avoids many of the difficulties encountered by simpler methods. 

To address the issue of sensitivity to the templates used, Newling et al. 
(2011) instead fit a parametrised functional form to the SN lightcurves. These 
post-processed data are then used in either a kernel density estimation method or 
a “boosting” machine learning algorithm, as discussed by Newling et al. (2011), 
to assign a probability to each classification output, rather than simply assign¬ 
ing a specific SN type. More recently, Richards et al. (2012) and Ishida & de 
Souza (2012) have introduced methods for SN photometric classification that do 
not include on any form of template fitting, but instead employ a mixture of di¬ 
mensional reduction of the SN data coupled with a machine learning algorithm. 
Richards et al. (2012) proposed a method that uses a semi-supervised learning 
approach applied to a database of SNe: first, a low-dimensional representation 
of each SN is constructed from a simultaneous analysis of all the lightcurves 
in the database. A classification model is then built in this low-dimensional 
“feature space” by learning from a set of spectroscopically confirmed training 
samples. This is subsequently used to estimate the type of each unknown SN. 

Subsequently, Ishida & de Souza (2012) proposed the use of Kernel Prin¬ 
cipal Component Analysis as a tool to find a suitable low-dimensional repre¬ 
sentation of SN lightcurves. In constructing this representation, only a spec¬ 
troscopically confirmed sample of SNe is used. Each unlabeled lightcurve is 
then projected into this space and a fe-nearest neighbour algorithm performs the 
classification. 

During my PhD studies, 1 also developed methods of photometric SNe clas¬ 
sification. In Paper III, I introduce an algorithm of SN classification between 
SN la and non-la (see Section 6.3) which does not involve templates. In Paper 
V, I further improve this method by including a HNN technique. 


3.3 Searching for SNe 

Having discussed the main observational techniques used for SN measurements 
and giving a brief description how to distinguish between different SN types, 
I now describe of how SN surveys are undertaken. Modern surveys typically 
require the following steps in order to observe and classify SNe: (i) finding the 
events; (ii) identifying the nature of the events using spectroscopic follow-up; 
(iii) measuring the lightcurve of interesting events in as many bands (see Figure 
3.1) as possible. 
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3.3.1 Finding events 

The main technique used to search for SNe is image subtraction. Since SNe 
are transient events one can subtract search images taken at different times to 
make non-variable objects disappear. This method was proposed in Hansen. 
Jorgensen, & Norgaard-Nielsen (1987) and first applied to real observations in 
Norgaard-Nielsen et al. (1989). By taking images obtained at the rate of about 
twice or thrice per month, one is very likely to detect SNe at their rising epoch, 
which is often early enough to trigger spectroscopic and photometric follow-up 
observations. This type of search greatly benefits from imaging as large a field 
of view as possible. 

Since it is often impossible to have photometric follow-up for all detected 
objects, one can implement a rolling search method, which consists of repeat¬ 
edly imaging the same sky patch, and using the image sequence not only for 
SN detection, but also for measuring their lightcurves. This not only saves on 
photometric follow-up, but also has the advantage that several SNe might be 
observed in the same field of view and produces deep images in long-duration 
surveys. This technique was very successfully implemented in ground-based 
surveys, see Section 3.3.3. 


3.3.2 Follow-up observations 

Spectroscopic follow-up: As was mentioned previously, spectroscopic follow¬ 
up typically requires a telescope of twice the diameter of the one used for imag¬ 
ing. In terms of frequency resolution, since SN spectra do not have narrow lines 
then A/<5A ~ 100 is sufficient for classification purposes, but X/5\ ~ 1000 is 
required to measure the redshift sufficiently accurately, which is another impor¬ 
tant role of spectroscopic follow-up. These observational requirements mean 
that only a small fraction of SNe will have their spectra measured. Thus, it is 
common during the image subtraction stage to attempt a pre-classification to 
identify the most interesting candidates. 

The most challenging part of SN spectroscopy is to perform host galaxy 
subtraction. Since the SNe are point sources, whereas their host galaxies are 
extended, the fraction of host galaxy light mixed with SN light will increase with 
redshift. For photometry this problem is solved by making image subtractions, 
but this solution is infeasible for spectroscopy. One of the current approaches is 
simply to ignore the problem and make a selection against SNe with bright host 
galaxies. Another way to address the problem is to use libraries of observed SNe 
and galaxies to synthesize the observed spectra (Howell et al. 2005). Finally, an 
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unavoidable downside of spectroscopy is its “single-mindedness”, owing to the 
small field of view of most instruments. 

Photometric follow-up: In contrast to spectroscopic observations, photo¬ 
metric follow-up does not require large observational facilities. The only con¬ 
straint is that the images have enough stars to serve as photometric and geo¬ 
metric anchors. Astronomical photometry requires measurements of the source 
of interest together with some standard stars using the same instrument. With 
more and more rolling search surveys, photometric follow-up will become in¬ 
creasingly unnecessary, especially with most interest focussing on high-z SNe, 
but it will still play a major role for searches for nearby SNe. 

3.3.3 Surveys 

Since the early 1990s, many independent teams have explored the sky for SNe. 
Some of them have targeted low redshifts, and others have looked as deep as 
possible with present technologies. And the SN search quest continues ... 

In this section I will describe the major SNIa surveys of the past, present 
and future, and discuss their main observational techniques, the time of data 
collection and the number of SNe observed. 

History of pioneering SN surveys 

The oldest SN survey is the Calan-Tololo survey (Hamuy et al. 1995, 1996), 
which observed a sample of “nearby” SNe at redshifts below roughly 0.1. Pho¬ 
tometry and spectroscopy for these SNe were measured on relatively small tele¬ 
scopes, of 1 m and 2 m in diameter, respectively. 

After the great success of Calan-Tololo, in the mid-90s two teams started a 
search for high-z SNe in order to use them as standard candles: the Supernova 
Cosmology Project (SCP) and the high-z team (HZT). By 1995 the projects 
were already yielding their first promising results, and both teams consequently 
received plenty of observational time, including photometric follow-up with the 
Hubble Space Telescope (HST). The results from each team were published in 
Riess et al. (1998) and Perlmutter et al. (1999), with 10 and 42 distant SNe 
respectively, and they came to the same conclusion: that the expansion of the 
Universe is accelerating. Such a surprising and profound discovery was obvi¬ 
ously a very strong reason to continue the search for further distant SNe. 

The accuracy of the colour measurement was, however, a major problem 
in these works, and so measuring accurate lightcurves was essential in order to 
improve high-z SN results. From 1997, HST started a new program on photo¬ 
metric follow-up of ground-based searches, followed in 2002 by an independent 
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Name 

ESSENCE 

SNES 

SDSS 


CTIO 4-m 

CEHT 3.6-m 

SDSS 2.5-m 

Imager 

on the 0.36 deg^ 

equipped with 

with its 1.52 


Mosaic-II 

1 deg^ Megacam 

deg^ camera 

Bands 

R,I 

g,r,i,z 

u,g,r,i,z 

z range 

[0.3,0.7] 

[0.2,L0] 

[0.1,0.4] 

Eocation 

Northern Chile 

Hawaii 

Apache Point, US 

Monitored 

36 points 

4 points 

300 deg^ 

regions 

(i.e. ~ 10 deg^) 

(i.e. ~ 10 deg^) 

equatorial stripe 

Erequency 

4th night 

4th to 5th night 

2nd night 


for a 3-month 

as long as points 

3 months 

Period 

per seasons 

remained visible 

per year 


2003-2008 

2003-2008 

2005-2007 years 

^SN events 

~ 100 

~250 

~ 370 


Table 3.1: Summary of second generation SN surveys. 


program for finding faint high-z events using the Advanced Camera for Surveys 
(Knop et al. 2003; Riess et al. 2004, 2006). This search allowed the collection 
of good SN data, with events up to z ~ 1.0. 


Second generation of SN surveys 

Within a decade of the success of Calan-Tololo, a few low-z SN surveys had 
targeted nearby sky for SNe: CfA (Riess et al. 1999; Jha et al. 2006; Hicken et 
al. 2009), the Carnegie Supernova Project (CSP; Contreras et al. 2009) and the 
Lick Observatory Supemovae Search (LOSS; Li et al. 1999), which provides 
events both inside and outside the Hubble flow. The latest completed nearby SN 
survey is SNFactory (Copin et al. 2009; Thomas et al. 2009; Bailey et al. 2009). 
The data from all these nearby searches now provides an excellent low-redshift 
“anchor” for cosmological studies using high-z SNe. 

The second generation of high-z surveys include: ESSENCE (Miknaitis et 
al. 2007), SNES (Astier et al. 2005) and SDSS (Holtzman et al. 2008), each of 
which used the rolling search technique. Their goal was to increase both the 
number and quality of well-measured high-z SNIa. Indeed, many lightcurves 
for high-z SNe have been collected, but unfortunately not all detected SNe have 
spectroscopic follow-up observations and measured redshifts, since using 4-m 
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and 8-m class telescopes for all of them was not feasible. A summary of these 
surveys is presented in Table 3.1. 

Joint analysis 

To place tight constraints on cosmological parameters, one needs a set of SNe at 
a range of redshifts, with coherent distance estimates and a good understanding 
of all systematic errors. This is usually achieved by the analysis of SN com¬ 
pilations from different surveys, since covering a large redshift range generally 
requires different instruments. Any new compilation also typically contains new 
events and state-of-the-art techniques for photometric calibration, and delivers 
correlated uncertainties. 

The most commonly used compilation is “Union”, which has had a few gen¬ 
erations: Union (Kowalski et al. 2008), Union2 (Amanullah et al. 2010) and the 
latest Union2.1 (Suzuki et al. 2012), and was very successful in making SN 
data more accessible for cosmological analysis outside of the SN community, 
where most of the users are primarily interested in cosmology constraints, and 
not so interested in parameters associated with the SNe themselves. In addition 
to the “Union” compilation, one should also mention the most recent compila¬ 
tion, published in the latest SDSS data release paper by Betoule et al. (2014). 
Indeed, the cosmological constraints derived from this latest survey were shown 
in Figure 1.5 in the Introduction. 

One does, however, have to be very careful when combining data-sets to¬ 
gether. The joint analysis of combined surveys comes at a price, since one must 
first check that the individual SN surveys produce results that are mutually con¬ 
sistent. If this is not the case, any results derived from their combination may be 
misleading. In Paper IV, I present a method to perform this task and apply it to 
existing compilations. 

Current and future SN surveys 

The intermediate Palomar Transient Factory (iPTF) is one of the nearby SN 
surveys currently in operation. Built as a continuation of the Palomar Transient 
Factory, it has now collected data for about 2200 SNe, out of which about 1400 
are SNIa. The observations are made in the R- and g-band and most of them have 
spectroscopic follow-ups. In 2016, iPTF will be transformed into the Zwicky 
Transient Factory (ZTF). Using a reworked version of the same telescope as 
iPTF, ZTF will use a new camera: the world’s largest in field-of-view at nearly 
50 deg^. This new camera will enable a full scan of the visible sky every night. 
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CSP II is an another nearby SNe search started in 2011 and anticipated to 
operate for five years. The difference between this survey and iPTF is that, rather 
than performing optical observations, it collects data in the near infrared and, 
together with time-series spectroscopy, it tries to achieve a distance precision of 
1-2% to build a definitive low-redshift reference for future rest-frame infrared 
observations of distant SNIa. 

The Pan-STARRS (Panoramic Survey Telescope and Rapid Response Sys¬ 
tem) constitutes a new generation of rolling search surveys. Designed in the 
University of Hawaii’s Institute for Astronomy, it has a wide-held camera that 
can make images of the whole sky every four nights. So far, only 1.5 years 
of results have been obtained (Rest et al. 2013; Scolnic et al. 2013), which has 
resulted in 146 spectroscopically conhrmed SNIa at 0.03 < z < 0.65. 

Another high-z survey currently in operation is DBS, which saw hrst-light 
in 2012 and will continue for hve years. This survey operates on the Blanco 
4-meter telescope in the Chilean Andes, with a 570-Megapixel digital camera, 
DECam. DES surveys a large swathe of the southern sky and will provide deep 
images of it. DES plans to obtain well sampled lightcurves for more than sev¬ 
eral thousand SNe. Unfortunately, DES does not have a spectroscopic follow-up 
program, and so will have to rely on photometric classihcation methods to de¬ 
termine which SNe are of Type la. 

Einally, I would like to mention that the Barge Synoptic Survey Telescope 
will begin operations in 2019 and will photograph the entire available sky ev¬ 
ery few nights and have data of outstanding quality. If all goes well, science 
observations will commence in 2021. 

Several mission concepts to measure SNe from space have also been devel¬ 
oped; unfortunately none of them passed the selection processes. Working from 
space is crucial at ^ > 1, because reliable distances should then be measured 
in the near infrared, in order to allow a direct comparison with nearby events 
measured in blue bands. 


3.4 SNIa 


Since SNIa are those used in cosmology, they have become the most studied 
type of SNe. In this Section, I briefly describe the physics of SNIa explosions 
and ways in which they can be standardised for use in cosmology. 
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3.4.1 Explosion models 

The physics of a SNIa explosion, one of the most energetic events in the Uni¬ 
verse, is still unclear. The most popular hypothesis is that the SNIa is a ther¬ 
monuclear explosion in carbon-oxygen white dwarfs in close binaries (Hoyle & 
Fowler 1960; Nomoto, Iwamoto, & Kishimoto 1997). Thielemann et al. (2004) 
show that the observed amount of energy in SNIa explosions is approximately 
the amount of energy that would be produced in the conversion of carbon and 
oxygen into iron. In order for this process to occur, white dwarfs must be close 
to the Chandrasekhar mass, so that carbon ignition can start. A realistic way for 
a white dwarf to grow to the Chandrasekhar mass is through mass transfer within 
a close binary. Unfortunately, the nature of the stars that can be the donors is not 
clear. Also, no progenitor system before a SN explosion has been conclusively 
identified. Another problem with this model is that there is observational evi¬ 
dence for systems where the progenitor has a mass much lower (e.g. Foley et al. 
2009) or higher (e.g. Howell et al. 2006) than the standard Chandrasekhar mass. 
Leading models for the SNIa progenitor are single-degenerate (Whelan & Iben 
1973; Nomoto 1982) and double-degenerate (Tutukov & Yungelson 1981; Iben 
& Tutukov 1984; Webbink 1984). There are many good reviews of this topic, 
see e.g. Wang & Han (2012); Hillebrandt & Niemeyer (2000). 

The nature of SNIa explosions is interesting not only from the point of view 
of stellar and galaxy evolution, but also for cosmological studies. Early studies 
of nearby SNe and numerical simulations of their explosion models have been 
used to derive the value of Hq (Branch 1992; Hoeflich & Khokkhlov 1996; 
Stritzinger & Leibundgut 2004). Unfortunately, this is the only example when 
explosion models have been used to infer cosmological parameters. The reason 
for this is the complexity of the explosion and the consequent light production. 
SN lightcurve models discussed in Section 3.4.3 can only broadly reproduce 
observed lightcurves. The same holds for the models of spectra. Improving the 
models could allow us to use them as templates for fitting data, which would 
reduce the distance scatter, and give insights into redshift-dependent systematic 
biasses in distances. 

3.4.2 SN lightcurves 

A SNIa emits most of its energy in the visible light, with some energy in the 
near UV and near IR. As was noted early on, SNIa have reproducible lightcurves 
(Minkowski 1964), as shown in the left panel of Figure 3.3, but with very dif¬ 
ferent behaviour in different colours; the example of SN2006D is shown in the 
right panel of Figure 3.3. 
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Figure 3.3: Left panel: SN lightcurves in the B-band. Credit: Perlmutter (2003). 
Right panel: Lightcurves in different bands. Credit: Astier (2012). 


In the left panel of Figure 3.3 one can see that SNIa exhibit a variability in 
the width of their lightcurves. Accounting for time dilation in order to compare 
restframe widths does not completely eliminate this difficulty. The problem 
was “solved” in 1993 from a small sample of well-measured events (Phillips 
1993). The Phillips relation quantifies how SNe that are intrinsically brighter 
have lightcurves that decline more slowly from their maximum. This relation 
is central to the use of SNIa for measuring cosmological distances. Moreover, 
in the i?-band, this variability of lightcurves is typically described by stretching 
the time axis of a single lightcurve template (Perlmutter et al. 1997; Goldhaber 
et al. 2001). The rise and fall timescales seem to vary together for Goldhaber et 
al. (2001) and Conley et al. (2006), while Hayden et al. (2010) finds them to be 
essentially independent. 

SNIa also exhibit variability in their colours (measured, e.g. at maximum) 
even at a fixed decline rate (Guy et al. 2010; Blondin et al. 2009). The source 
of colour variability, which is unrelated to the decline rate, is still unclear. One 
of the options is that SNe intrinsically have different colors (e.g. Foley & Kasen 
2011), while another hypothesis relates it to extinction by dust in the host galaxy. 
Most likely the truth is a mixture of both and possibly some other astronomical 
reasons that we do not yet understand. 

To use SNIa for cosmological searches, their fluxes must be expressed in the 
same way, and for this purposes one uses lightcurve fitters. 
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3.4.3 Standardization of SNe 

Using empirical lightcurve models not only “standardises” SN lightcurves, but 
also “compresses” the photometric data characterizing an event. The aim of all 
the lightcurve models can be summarised as the derivation of a distance, for 
which one needs a brightness (anything that scales linearly with the observed 
flux), the decline rate and colour. 

In the original work of Phillips (1993), the data measurements were very 
well sampled, so there was no need for an explicit model. Phillips built smooth 
discrete templates with different Amis values, where this quantity denotes the 
decline rate measured as the magnitude difference between peak and 15 (rest- 
frame) days later. Hamuy et al. (1995) generalised this method to allow it to 
fit much more sparse observations. This method enjoyed broad usage and the 
resulting templates have been continuously updated (Phillips et al. 1999; Ger¬ 
many et al. 2004; Prieto, Rest, & Suntzeff 2006). Recently, the SNOOPY model 
(Burns et al. 201 1) revisited the Amis paradigm with an extension of it into 
the near IR. Another method was used to account for the decline-rate variation: 
the “stretch” paradigm (Perlmutter et al. 1997) proposed to stretch the time axis 
of these templates in the B and V bands. A disadvantage of all these types of 
models is that they rely on lightcurve templates. 

Spectral Adaptive Lightcurve Template 2 (SALT2) 

The most commonly used empirical lightcurve model is currently SALT2 (Guy 
et al. 2007; Mosher et al. 2014). The main idea behind this method is that all the 
measurements are fit using a function of the phase p and the wavelength A 

F{p, A) = xo [Mo{p, A) -h xiMi{p, A)] exp[c CL(A)], (3.1) 

where Mq is the mean SNIa spectral energy distribution. Mi accounts for light- 
curve width variations, CL is a color law which incorporates any wavelength- 
dependent color variations and is independent of epoch, xi is the lightcurve 
shape parameter, xq is the overall flux scale and c is the peak B-V color. The 
first three of the above parameters are SN-independent and describe all SNIa, 
whereas the last three parameters are unique for each SN. No assumptions about 
dust or extinction laws are made a priori. A cartoon schematic of the SALT2 
training process is shown in Figure 3.4. 

SALT2 also allows one to accommodate the intrinsic variability of SNIa. To 
achieve this, the model includes sources of additional uncertainty. These are 
a broadband magnitude scatter k{X), which deals with the color law and the c 
parameter, and a spectral “error snake” S{p, A) to account for the xi parameter. 
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Figure 3.4: The three stages of the SALT2 model training process. Each stage 
calculates best-fit model parameters using successively improved estimates of 
model uncertainties. 


In order to calculate and include these effects, SALT2 performs three iterations 
of x^-minimisation, as illustrated in Figure 3.4. 

As a result, for each SN, SALT2 reports best-fit values xq, c, the redshift 
2, and the covariance matrix 


C'SALT2 



'^Xo,Xl 

^Xo,C 



O'xi ,c 

^Xq^C 

^Xi,C 



(3.2) 


Let us denote the result of the SALT2 lightcurve fitting procedure as 

-DsALT2,i = {Zi,Xoi,Xii, Ci, Ci,SALT2}, (3.3) 

where i runs through the n SNe in the sample. These outputs can be used to 
obtain distance estimates for the SNe using Eq. 4.5. 

Other lightcurve models and distance estimators 

As mentioned previously, the SALT2 model is one of the most frequently used 
SN lightcurve fitters. Together with SALT (Guy et al. 2005; Astier et al. 2006) 
and SiFTO (Conley et al. 2008), it mostly uses the “stretch” paradigm. SALT 
is just a earlier version of SALT2, whereas SiFTO models the lightcurves from 
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spectral energy distribution templates. SiFTO uses the fact that SN lightcurves 
have different shapes in different bands (see the right panel of Figure 3.3). As in 
SALT2, these methods do not give direct distant estimates, but one needs to use 
equations of the form of Eq. 4.5. One can also perform analyses with combined 
SALT2/SiFTO fits, see e.g. Guy et al. (2010). 

Another frequently used model is the Multi Light Curve Shape (MFCS, 
Riess. Press, & Kirshner 1996) model, which consists of a one-parameter family 
of lightcurve shapes with standard visible bands B, V, R and I, see Figure 3.1. 
This complete model of SN lightcurves depends on three parameters (plus a ref¬ 
erence date): a distance modulus, a brightness offset, and an extinction value. 
Initially trained on 12 events it was updated to about 100 events. Also, the model 
has been extended towards the blue by adding the U-band. This second version 
is named MLCS2k2 (Jha, Riess, & Kirshner 2007). 

Among other methods are the Bayesian Adapted Template Match method 
(Tonry et al. 2003), the so-called “Bailey ratio” (Bailey et al. 2009), the CMagic 
distance estimator (Wang et al. 2003), the Gaussian-process regression method 
(Kim et al. 2013), plus many more. 

Unfortunately, not only is none of these methods the “correct” one, but the 
cosmological parameter constraints one obtains using the same data but differ¬ 
ent lightcurve fitting methods are significantly different. Comparisons between 
SALT2 and MLCS2k2 were made in Kessler et al. (2009), which finds fhaf when 
the restframe U -band is taken into account, systematically different distances are 
obtained. In Guy et al. (2010), SALT2 and SiFTO are compared and the differ¬ 
ences obtained are acceptable if the compilation is contains the level of a few 
hundred events. 

3.5 Type non-la SNe in cosmological analyses 

Usually non-la SNe are an unwanted contaminant in SNIa compilations. Nev¬ 
ertheless, there is one type of non-la SNe that can be used for cosmological 
analyses. Type II Plateau SNe (SNII-P) can be used as standard candles to 
determine luminosity distances, although only for smaller distances and with 
lower accuracy than hose of SNIa. Despite this shortcoming, SNII-P explosions 
are better understood than SNIa. Another advantage of SNII-P is that they have 
been found only in late-type galaxies, whereas SNe la have been observed both 
in late- and early-type galaxies. Consequently, one would expect distance esti¬ 
mates from SNII-P to suffer less from biasses resulting from different galactic 
environments. This difference in systematic effects allows SNII-P data to com¬ 
plement SNIa analyses (D’Andrea et al. 2010). 
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Chapter 4 


Cosmological parameter 
inference using SNe 


‘All my Bayesian friends have 
objected at this point that there’s 
no such concept as bias in 
Bayesian analysis. It is true that 
there is no meaningful, exact 
definition of bias except in a 
frequentist sense. What I mean 
here is that the answer [... ] is 
usually wrong, and in a given 
direction. The dictionary calls this 
“bias”.’ 

Stephen Gull 


Undoubtedly the greatest accomplishment of SN studies over the past 20 
years is the discovery of the current accelerating expansion of the Universe. This 
was recognised by the award of the Nobel Prize in Physics in 2011, which was 
divided one-half to Saul Perlmutter, the other half jointly to Brian P. Schmidt 
and Adam G. Riess. 

In order to constrain cosmological parameters, one needs to make an in¬ 
ference from the observational measurements. It is usual to adopt a two-stage 
approach in which SN observations are first analysed using a lightcurve fitting 
program, the outputs of which are then used in a second-stage of inference for 
the cosmological parameters. The first step of this process was already described 
in Section 3.4.3. In this chapter, I outline the commonly-used methods of cos¬ 
mological inference, assessing their advantages and disadvantages. Since the 
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SALT2 algorithm is the most widely used for SN fitting and I made use of it 
in my papers, I will also use it here. However, the results and conclusions pre¬ 
sented in this Chapter are not specific to SALT2 and can be easily generalised 
to the outputs of other lightcurve fitting codes like SiFTO, MLCS2k2, etc. 


4.1 Comparing theory and observations 


As discussed in Section 3.4.3, on the observational side, for each SN, the SALT2 
algorithm reports best-fit values xq, xi,c, the redshift z, and the covariance 

matrix C'salt 2 defined in Eq. 3.2. This covariance matrix does not include 
covariances involving m*^. We use 




^^xo,xi 

2xo In 10 

hCTxgjC 

2xoIn 10 


to construct the covariance matrix 


C = 


,x\ 





^xi ,c 



(4.1) 

(4.2) 


(4.3) 


Let us denote the result of the SALT2 lightcurve fitting procedure, with the 
above rescaled covariance matrix, as 


Di = {zi,m*^i,xii,Ci,Ci], 


(4.4) 


where i runs through the n SNe in the sample. 

The data Di define the “observed” distance modulus for the fth SN as 

(3, Mo) = j - Mq + axpi - I3ci. (4.5) 

This expression contains three unknown parameters, all of which are assumed 
global, i.e. having the same value for all SNIa. These parameters are: Mq, the 
H-band absolute magnitude of the SN, and a, f3, which are nuisance parameters 
controlling the stretch and colour corrections. 
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On the theory side, in a Friedman-Robertson-Walker cosmology, if an ob¬ 
ject has an absolute luminosity L and one measures its flux to be F, then its 
luminosity distance is given by^ 


Di^ = 



(4.6) 


The distance modulus is defined by /r = m — Mq, where m = —2.5 logj^Q F is 
the apparent magnitude of the object and Mq is its absolute magnitude. Hence, 
p, can be written as 

where the constant offset is included such that one satisfies the convention that 
= 0 at D]^ = 10 pc. 

In terms of the cosmological parameters ^de.o, Hq, tu}, an ob¬ 

ject at a redshift z has the luminosity distance 

where 

I{z)^ f _ , (4.9) 

Y (1 + 2:)^nrn,0 + (l + + + z)‘^Q,kp 


in which (neglecting the present-day energy density in radiation) 0^. q = 1 — 
(^m,o — (^de,o and S{x) = X, sinx or sinhx for a spatially-flat (14^,0 = 0)^ 
closed (Ofc 0 < 0) or open (flfc q > 0) universe, respectively. A cosmological 
constant corresponds to the special case in which the dark-energy equation-of- 
state parameter has the value w = —1 in this case, the present-day density 
parameter is usually denoted by flA,o. 

To compare the observations and theory, one could therefore identify a set 
of objects {i = 1, 2,..., iV) whose absolute magnitudes are known a priori 
(standard candles), measure their distance moduli and redshifts, and consider 
the differences (often termed Hubble diagram residuals) 

Api = pf^{a,P,Mo)-p{zi,^). (4.10) 


These could then be used to place constraints on the cosmological parameters 
plus other parameters of interest, such as a, /3 and Mq. 

'For simplicity, I refer here to bolometric quantities, i.e. integrated over all frequencies, rather 
than in terms of fluxes in specific frequency bands. 
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Parameter 

Value used for simulations 

0"mi 

0.01 

a 

0.12 

/? 

3.2 

Mo 

-19.3 

O’rnt, 

0.05 

^Xi 

0.5 

CTc 

0.05 

CTz 

0.0001 

^^m,0 

0.3 

h 

0.7 


Table 4.1: Parameter values used in the generation of simulated SNIa data, as¬ 
suming a spatially-flat universe. 


One should note, however, that if the standard candles all have the same 
absolute magnitude Mq, but this value is unknown, then it is degenerate with 
the Hubble constant Hq, as evident from Eqs. 4.6 and 4.8. Perhaps more im¬ 
portantly, Nature has neglected even to provide such a set of “uncalibrated” 
standard candles. Instead, we must make do with SNIa for which the absolute 
magnitudes vary, but can nonetheless be standardised, as discussed in Section 
3.4. 

4.2 Toy simulations of SN data 

In order to discuss the advantages and disadvantages of different inference meth¬ 
ods, I apply them to toy simulated SNIa data. In these simulations I assume 
that the off-diagonal elements of the covariance matrix in Eq. 4.3 are zero; this 
makes a negligible difference since the off-diagonal elements are very small in 
practice. 

To make the simulation, the procedure below is performed for each SN (i = 
1 , 2 ,..., Nsn), and the values of the various parameters used in the simulations 
are given in Table 4.1 . 

1. The redshift is drawn independently from Zi ~ U{0, 1), where U{a, b) 
denotes a uniform distribution in the range [a, 6]. 

2. The predicted pLi{zi,^) is calculated using Eq. 4.7. 
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3. The hidden variables Mj, xi^i and Cj are drawn from the respeetive distri¬ 
butions Mj ~ AA(Mo, ~ t/(—5.0, 3.0) and Cj ~ t/(—0.2,0.3), 

where AA(^, cj^) denotes a normal (Gaussian) distribution with mean ji 
and varianee 

4. The value of rrifi ■ is ealeulated using the Phillips relation m*^ ■ = n{zi, '^)+ 
Mi - axi^i + fid. 

5. The simulated observational data are obtained by drawing independently 
from the distributions Zi ~ M{zi, a‘1 fj, ■ ~ M{w3U •, a'f* ), xi^i ~ 

J and c* ~ Af{ci, ol f). 

For my simulations, I generated 100 independent data-sets, each one con¬ 
taining 200 SNe. In my analyses, I assume /i = 0.7 (as is established to a few 
per cent accuracy by a number of cosmological probes) and vary Mq, a 
and /3. 

4.3 Naive definition of the likelihood 

Assuming that the Hubble diagram residuals are Gaussian-distributed one often 
defines the likelihood as 


^=n 


1 

\/^CJj(Q;,/?, dint) 



20-2(0;,/?, dint) 


(4.11) 


In this expression, I have included the dependence on (only) the parameters to 
be fitted. Here the total dispersion af is given by: 

+ f^int + /5)- (4-12) 

The three components arise as follows: 

1. Uncertainties in the peculiar velocity and/or the spectroscopic measure¬ 
ments of either the host galaxy or SNIa itself lead to an uncertainty az,i in 
its estimated redshift, which in turn induces an error d^ ^ in the distance 
modulus. 

2. Even after correction for stretch and colour, there remains some global 
variation in the SNIa absolute magnitudes. The quantity dint contains all 
of these intrinsic dispersion errors. 
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3. The uncertainty in the fitting of the parameters by SALT2 is given by 

= (4-13) 

where the transposed vector -0* = {I, a, —j5) and Ci is the covariance 
matrix given in Eq. 4.3. 

Additional errors, such as those due to lensing or Milky Way dust extinction can 
also be added in at this stage, but I do not consider such errors here. 

For convenience, one often writes Eq. 4.11 in the form: 

^SN r obs _ ]2 

— ln(L) = const + 

i=l * i=l 

As shown in Figure 4.1, maximising this function unfortunately leads to 
biased results for some of the parameters. This is particularly acute for the /3 
parameter. This effect stops us using the likelihood in Eq. 4.11, although it has 
been employed and investigated in some previous analyses (D’Agostini 2005). 

It is worth pointing out that, in the case when the error bars on the color and 
stretch parameters are small, the biasses “disappear”, as one can see in Figure 
4.2. 

Gull (1989) identified fhe origin of fhe problem, albeif in fhe concepfually 
more slraighlforward confexf of filling a slraighl line 

y = ax + h, (4.15) 

lo a sel of dafa {xi,yi) where fhe Xi and yi bofh have an associated measuremenf 
error and a, 6 are fhe paramelers of fhe model. One can see lhal Ihis linear loy 
model is very similar lo fhe relation belween fhe paramelers in Eq. 4.5, where 
Ihe parameters a, b of Ihe linear model correspond lo Ihe parameters a, fi. I will 
relum lo Gull’s findings in Section 4.5, bul I firsl describe Ihe more pragmatic, 
empirical approach adopted by Ihe SN community lo address Ihe problem of 
bias associated wilh Ihe use of Eq. 4.5. 


4.4 The standard x^-method 


The precise melhods used for Ihe estimation of cosmological parameters differ 
belween SN consortia, bul Ihe main elemenls are common lo all. Cenlral lo Ihe 
approach is Ihe -statistic, which is defined as 


i=l 


[/rf0«,/3,Mo)-/r0^)]^ 

af{a,(3,aint) 


where af is lhal given in Eq. 4.12. 


(4.16) 
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mean =0.27006, std = 0.036556 



M(j: mean =-19.277, std = 0.029502 



-19.34 -19.31 -19.28 -19.25 -19.22 


(x: mean =0.11395, std = 0.0054062 P: mean =2.8673, std = 0.078988 



0.1 0.11 0.12 0.13 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 


Figure 4.1: Histograms showing the distribution of the point estimates of the 
parameters flm o> Mq, a and /3. The green vertical lines show the mean values 
of the point estimates, and the solid red vertical lines show the true values of 
the parameters used to simulate the data. The data are analysed using the naive 
likelihood method. 
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Figure 4.2: The same as Figure 4.1, but for simulations generated using the 
values Uxi =0.1 and cJc = 0.01. 
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mean =0.27086, std = 0.036888 
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Figure 4.3: The same as Figure 4.1, but for simulations analysed with the x^- 
method. 


We see that this expression is merely the second term on the right-hand side 
of Eq. 4.14. The full expression for the (minus) log-likelihood on the right-hand 
side of Eq. 4.14 contains two competing terms: minimisation of the second term 
(X^) would favor large values of cji, whereas minimisation of the third term 
would favor small values of cjj. In Eq. 4.16 this last term is simply ignored. 
Somewhat surprisingly, it has been established through simulation that discard¬ 
ing this term removes the bias in the estimated parameters, provided dint is held 
fixed. It is fair to say that the fundamental reason for this “magical” removal of 
the bias is not well understood, even by those members of the SN community 
that use the method on a regular basis. 

Typically, the -function (Eq. 4.16) is minimized simultaneously with re¬ 
spect to the cosmological parameters and the global SNla nuisance parame¬ 
ters a, /? and Mq. This minimisation can, however, be performed using different 
search algorithms (e.g. MCMC techniques or grid searches) and the treatment 
of Mq (which is degenerate with Hq), in particular whether this parameter is 
marginalised over analytically or numerically. 

There remains, however, the issue of determining an appropriate value for 
dint, which is usually performed as follows. Once the minimum value of x^ has 
been obtained, the value of dint is estimated by requiring that Xmin/-^dof ~ 1; 
this process is usually iterated until convergence is obtained. 
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The -method has been fully tested and proven to be satisfactory for cos¬ 
mological parameter inference. Results from the toy example are shown in Fig¬ 
ure 4.3. Nevertheless, the method does have a few problems: 

1. The use of in Eq. 4.16 is not statistically well-motivated, but is based 
only on empirical evidence and experience (Gull 1989). 

2. The global parameters a, /3 appear in both the numerator and denomina¬ 
tor, since they act as both range and location parameters. Thus, the errors 
on a, j5 are not Gaussian. The informal test which states that Xmin/-^dof ~ 
1 for a good fit model only holds in the Gaussian case. Hence its use can¬ 
not be justified here. 

3. Since 1 use an unnormalised likelihood (without the second term on the 
right-hand side of Eq. 4.14), one can not calculate the Bayesian evidence, 
and hence model selection is not possible. One can look on this problem 
differently: every model will fit the data equally well, since by construc¬ 
tion cjint is determined by demanding that Xmin/^dof ~ 1- 

4. This method obtains only a best value for dint without any indication of 
the error on that value. 

4.5 BHM 

Gull (1989) solved the problem of fitting a straight line to data with errors in 
both X and y. The abstract of this paper is reproduced bellow: 

‘A Bayesian solution is presented to the problem of straight- 
line fitting when both variables x and y are subject to error. The 
solution, which is fully symmetric with respect to x and y, contains 
a very surprising feature: it requires an informative prior for the 
distribution of sample positions. An uninformative prior leads to a 
bias in the estimated slope.’ 

In essence, this bias is the same as that which stopped us from using the likeli¬ 
hood in Eq. 4.11 for estimating cosmological parameters. 

Gull’s proposed solution has two main steps: 

1. Introduction of “hidden variables”, which are the “true” values of the mea¬ 
sured quantities; these variables are nuisance parameters and will be inte¬ 
grated away in the end. 


52 


Chapter 4. Cosmological parameter inference using SNe 



Figure 4.4: Graphical network showing the deterministic (dashed) and proba¬ 
bilistic (solid) connections between variables in the BHM. Variables of interest 
are in red, hidden (unobserved) variables are in blue and observed data (denoted 
by hats) are in green. Credit: March et al. (201 la). 

2. The imposition of an informative prior on these variables, which contains 
hyper-parameters that are allowed to vary simultaneously with the other 
parameters, and are then marginalised over. 

These ideas were implemented in the context of SN cosmological analyses in 
the BHM (March et al. 201 la). 

In this method one considers a probability for the full catalogue of SNe si¬ 
multaneously, since the form of the BHM likelihood for a set of SN observations 
(i = 1, 2,..., iV) is not simply the product of the likelihoods for each individual 
SN. 

One begins by considering the probability 

P = Pr(rh;^,xi,c,z,m5,xi,c,z,M|^,Q;,/3,cTint,C,<T^), (4.17) 

where boldface symbols denote vectors containing the corresponding parame¬ 
ters for each SN (i = 1,2,..., N). I denote the parameters of the model to be 
fitted by 4> = {‘rf, a, (3, dint}, and those assumed known by = {C, (7^}. 

The BHM likelihood function described in March et al. (20 11 a) is obtained 
by marginalising Eq. 4.17 over the hidden variables, together with other nui¬ 
sance parameters describing the SN population. One can see the relation be¬ 
tween different types of variables in this model in Figure 4.4. Thus, one obtains 
2 ^bhm(^) = xi, c, z\cf), -0) by marginalising as follows: 
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i(<A) = /• 


dm’Jj dxi dcdzdM Pr(rh|j, Xi, c, z, mg,Xi, c, z, M|^, -i/j). 

(4.18) 


To perform this marginalisation, one assumes that: (i) the measured redshift 
Zi is independent of m*^ xi^i and c*; (ii) the true redshift Zi is independent of 
Mi, Xi, Ci, and (hi) the exact relationship p.{zi,5o) = m*^ ■ — Mi + axi^i — fici 
between the hidden variables holds. This enables one to write Eq. 4.17 as 


P = Pr(rhB, xi, clm^, xi, c, C) Pr(z|z, cr^) Pr(M, xi, c|(Tint) 

Pr(z) — M + axi — /3c — /i(z, ^)]. (4.19) 


The presence of the delta-function allows one to perform the integral over 
immediately. Moreover, the first two probability distributions on the right-hand 
side are simply the product of the corresponding distributions for each SN sep¬ 
arately, namely 


N 


Pr(rh|j,xi,c|m5,xi,c,C) = 


exp 


_i(v-vrC-i(v-v) 


2=1 

N 


Pr(z|z,<T^) = 


exp 


|27rQ|V2 


2=1 


(2vrcT2ji/2 


(4.20) 

(4.21) 


where the vector v is defined by v = (m^, xi, c)^ and similarly for v. 

A key difference between the BHM and the standard -method is the spec¬ 
ification of the prior Pr(M, xi, c|(Tint) on the right-hand side of Eq. 4.19. A 
common choice is to consider the SNe as true standard candles by assigning 
Pr(Mj) = 5{Mi — Mq), while adopting a uniform normalised top hat distribu¬ 
tion on each remaining hidden variable c* and xi^i (D'Agostini 2005), but this 
just leads to the naive likelihood discussed in Section 4.3, with Cint set to zero. 
I therefore impose a different prior here. It is again assumed separable in the 
sense that Pr(M, xi, c|(Tint) = Pr(M|cJint) Pr(xi) Pr(c), but each of the dis¬ 
tributions on the right-hand side does not factorise into terms corresponding to 
individual SNe. This occurs since the BHM introduces and marginalises over 
additional nuisance hyper-parameters Mq, x*, Rx, c* and associated with the 
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mean =0.26973, std = 0.036555 a: mean =0.11927, std = 0.0056804 p: mean =3.2203, std = 0.1056 
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Figure 4.5: The same as Figure 4.1, but for simulations analysed with the BHM. 
Note that the parameter Mq is marginalised over analytically. 


SN population, and described below. In particular, one writes 

N 

Pr(M|c7int) = / dMo Pr(Mo) Pr(Mi|Mo, 

i=l 

Pr(xi) = jj dx^dRx Pi{x^)Fr{R^) Pr(xi,i|x*, 7?^:), 

i=l 

Pr(c) = JJ dc^dRc Pr(c*) Pr(i2c) Pr(cj|c*, i?c), 


(4.22) 

(4.23) 

(4.24) 


in which it is assumed that a number of probability distributions are separable. 
The prior distribution of each of the hidden variables Mi, xi^i, Ci, is assumed to 
be Gaussian, so that 


Pr(Mi|Mo,o-int) 

— A/'(Mo, CTjnt), 

(4.25) 

Fr{xifx^:,Rx) 

= J\f{x^,Rl), 

(4.26) 

Pr(ci c*, Rc) 

= M{c,,Rl). 

(4.27) 


Finally, one must also assign the priors on the nuisance hyper-parameters 
Mq, X*, Rx, c*, Rc, which are taken to be 


Pr(Mo) =AA(Mo, 

(4.28) 

Pr(x*) =M{d,alf), 

(4.29) 

Pr(c*) = AA(0,o-^J, 

(4.30) 

Pr(i2a;) = l/Rx, 

(4.31) 

Pr(i?c) = 1/-Rc, 

(4.32) 
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where one assumes Mq = —19.3 mag, ctmo = 2.0 mag, cja;* = 1 and cJc* = 1. 
The priors on x* and c* are taken as Gaussian because this is the maximum- 
entropy prior on variables that can take positive and negative values, and for 
which one has an expectation for the mean and variance. In this case a mean of 
zero and a standard deviation of unity is appropriate. Since the widths Rx and 
Rc are non-negative scale parameters, we adopt the non-informative Jeffreys 
prior on them. 

All the necessary integrals in Eq. 4.18 and Eqs. 4.22-4.24 are Gaussian, 
except those over Rx and Rc. Thus, March et al. (2011) integrate analytically to 
obtain a final expression for the likelihood in terms of an integral over only Rx 
and Rc, these parameters are therefore added to the parameter vector cj), sampled 
from and marginalised out numerically to recover >Cbhm(</») defined in Eq. 4.18. 

One can see from Eigure 4.5, fhaf fhe BHM shows fhe same level of preci¬ 
sion for paramefer esfimafes as fhe sfandard mefhod, so bofh mefhods yield 
unbiased paramefer consfrainfs. Eor fhe BHM, however, one obfains fwo addi- 
fional benefifs: (i) fhe full probabilify disfribufion for dint; and (ii) fhe abilify fo 
perform consisfenf Bayesian inference, including model selecfion. In Paper I, I 
compare fhe x^-mefhod and BHM in much more defail. I applied bofh of fhese 
mefhods fo more realisfic examples fhan fhe toy example used in fhis chapter 
and to real dafa (see Paper I and Secfion 6.1). 

In spile of fhese advanlages, one should also mention some shortcomings 
of BHM. Eirsl, a key difference befween BHM and fhe sfandard -method is 
lhal £bhm does nof depend on Mq, since fhis paramefer is marginalised ouf 
analyfically. Similarly, so loo are fhe hidden variables Mj, which correspond fo 
fhe “Irue” absolufe magnilude of each SN. Consequenfly, one cannol calculate 
fhe residuals in Eq. 4.10 for each SN. This can make if difficull fo compare fhe 
BHM wilh olher mefhods, since visual inspecfion of fhe residuals oflen plays an 
imporfanf role. A furlher, and perhaps more critical, shorlcoming of fhe BHM is 
fhe assumplion lhal fhe hidden variables Cj and xi^i do nof depend on redshifl; 
fhis is a very crude approximafion lhal is almosl certainly nof Irue. In fhe evenl 
lhal fhese quantities do evolve wilh redshifl, one may show using simulations 
that the BHM can begin to produce biased parameter constraints, whereas the 
standard x^-method is more robust to this effect, even though the formulation 
does not explicitly allow for this eventuality; this is illustrated in Eigure 4.6. 


4.6 Generalised Bayesian Likelihood 

The above shortcomings of the BHM result from the assumption of simple 
(redshift-independent) Gaussian priors on the hidden variables and nuisance 
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Figure 4.6: The same as Figure 4.5, but for SNe with xi{z > 0.5) = xi{z < 
0.5) + 1, analysed using the x^-method (top row) and the BHM (bottom row). 


parameters, which is necessary in order to marginalise over them analytically. 
Consequently, it is of interest to retain the overall structure of the BHM, but 
to replace the analytical marginalisation with a numerical one. This results in 
a larger computational burden, but allows for the imposition of more realistic 
priors on the hidden variables, including redshift dependence, and provides ac¬ 
cess to best-fit values (indeed full marginal posterior distributions) of the hidden 
variables. 

It therefore seems natural to improve upon the BHM as follows, to produce 
a Generalised Bayesian Likelihood. 

1. Integrate Eq. 4.18 and Eqs. 4.22-4.24 numerically by sampling from the 
set of hidden parameters xi^i, Ci, Zi, Mi (i = 1,2,..., N) and nuisance 
hyper-parameters Mq, x*, Rx, c*, Rc (in addition to the parameters of 
interest 4>), and numerically (as opposed to analytically) marginalising 
over them. 

2. Allow for more realistic priors on the hidden and nuisance parameters than 
the simple separable Gaussian forms assumed in the BHM, in particular 
including the possibility of redshift dependence of c* and x\^i. 
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The resulting approach does, however, carry with it a substantially increased 
computational burden compared to the BHM, since the dimension of the pa¬ 
rameter space to be sampled is increased by AN + 5, where N is the number 
of SNe being analysed. Even for existing SN data-sets, such as Union2 with 
over 600 SNe, this leads to a parameter space with over 2000 dimensions. To 
sample from a posterior distribution of this dimensionality and evaluate the ev¬ 
idence exceeds the current capabilities of the widely used Bayesian inference 
code MultiNest. Therefore alternative methods are required. Provided the 
posterior distribution is relatively benign, i.e. smooth, unimodal and without 
very pronounced tails, then MCMC methods, such as Gibbs or Hamiltonian 
sampling, can produce reliable parameter estimates, and be combined with ther¬ 
modynamic integration to evaluate the evidence. Another possibility is to use 
alternative forms of nested sampling, such as the DNest algorithm (Brewer, Par- 
tay, & Csanyi 201 0), which is considerably less efficient than MultiNest up to 
around 50 dimensions, but is capable of producing posterior samples and ev¬ 
idence estimates in spaces with several thousand dimensions. I am currently 
investigating these various alternative sampling methods, and plan to present 
the Generalised Bayesian Likelihood and its application to real and simulated 
SN data in a forthcoming publication. 
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Chapter 5 


Gravitational lensing of SNe 


‘Light thinks it travels faster than 
anything but it is wrong. No 
matter how fast light travels, it 
finds the darkness has always got 
there hrst, and is waiting for it.’ 

Terry Pratchett 

General relativity predicts that light rays are bent around massive bodies or, 
more generally, undergo deflections when they traverse a region in which the 
gravitational field is inhomogeneous. This effect is relevant to the study of SNe 
since the mass distribution along the line-of-sight to a SN will cause a gravi¬ 
tational lensing effect. The discussion in the previous chapter, and indeed in 
most papers on SN cosmology, ignores this effect. Nonetheless, such gravita¬ 
tional lensing can be used to constrain the form of the galaxy dark matter haloes 
along the lines-of-sight to the SNe, as I show in Paper II and Section 6.2. In 
this chapter I therefore give a brief description of some of the basic features of 
gravitational lensing. 

5.1 Lens equation 

Ray-tracing in curved spacetime, as illustrated in Figure 5.1, can be described 
by a lens equation. 


= (5.1) 

where 9^ is the angle between source and the lens that would be observed in 
absence of lensing, 9i is the observed angle between image and lens, and a is the 
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Figure 5.1: Gravitational lens geometry. 


deflection angle; all the distances are angular-diameter distances D^, since these 
are defined such that Da = ^/A0, where i is the proper transverse distance and 
A0 is the angle it subtends at the observer. In our expanding Universe, for 
objects at redshift 2 ;^ and zb, the angular-diameter distance is given by 


1 dz 

“ {1 + zb)HoJ,^ 


(5.2) 


where I have set c = 1. 

In the case of a point mass lens, with mass M, Eq. 5.1 becomes 


9s = 01 


Dls 4GM 
DsDl 9i 


(5.3) 


In the very extreme case in which the source and lens are collinear, the source 
will be lensed into an Einstein ring and the angular separation is given by the 
Einstein angle: 


4GMDls 

DsDl 


(5.4) 


Thus for each lens one may define a critical surface mass density, above which 
one obtains an Einstein ring or multiple images; this is given by 


Sc 


1 DsDl 
47rG Dls ’ 


(5.5) 


Now, if one considers a more general lens than a point mass, for any surface 
density S(0i), which is obtained by projecting the matter distribution on to the 
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lens plane, one can introduce the dimensionless scaled surface mass density, or 
convergence k{9i), given by 


k((9i) = 


m) 

Sc 


(5.6) 


In other words, the convergence describes the focussing by the lens of the light 
emitted by the source. This focussing causes the source to appear larger. Ac¬ 
cording to Liouville’s theorem (conservation of the phase-space density of the 
photons emitted by the source), the increase of size will lead to increase of 
brightness. At the same time, distortion by twisting of the light rays in the lens 
can occur. This will lead to shearing of the image shape. 

To describe both phenomena one can introduce the lens map 


1 - AC - 7l -72 \ 

-72 1 - AC -h 7i y ’ 

where 71 = 7008 ( 2 ^)) and 72 = 7 sin( 2 (y), with 7 being the ellipticity and 
y) is a position angle for an initially circular source that has been lensed into 
an ellipse. In terms of the convergence and shear one can define the important 
quantity known as the magnification: 



T = 


1 

(1 — kY — 7 ^ ’ 


which corresponds to the ratio of the image area to the source area. 


(5.7) 


5.2 Types of gravitational lensing 

The obvious case of interest is when the source is within the Einstein angle of the 
lens and multiple images, arcs, or even distinct parts of an Einstein ring appear; 
this is called strong gravitational lensing. Although predicted long before, the 
first multiple-image system was discovered by Walsh et al. (1979). 

Observing multiple images of the same source, one can estimate the lens 
matter density distribution from simple Euclidean space calculations. One can 
also use the fact that some of the sources vary with time, so the multiple im¬ 
ages could also vary with time. Time delays of the multiple images are another 
powerful mechanism which can be used to calculate the Hubble constant. 
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Strong tensing is often divided into subcategories, depending on the angular 
resolution of the images. Using the point source approximation in Eq. 5.4 we 
can get a feeling for the amount of the tensing in typical astronomical situations: 




arcseconds 


(5.8) 


The above estimates relate, respectively, to the two most commonly occur¬ 
ring situations: microlensing and macrolensing. Microlensing is interesting 
since lenses range from about the mass of a planet to the mass of a star. Mi¬ 
crolensing is commonly used to: (i) constrain the nature of dark matter, for 
example in searches for MACHOs (massive compact halo objects); (ii) detect 
extrasolar planets; (iii) constrain the structure of the Milky Way disk, and do 
much more. Macrolensing, with separations of typically arcseconds, is the range 
where most of the collected images occur, so macrolensing is often a synonym 
for strong lensing itself. 

Depending on the alignment of the lens and source, weak lensing could 
appear. Calling this lensing “weak” we should remember that it is not neces¬ 
sarily less important. Usually, weak lensing occurs when the lens is located 
outside the Einstein radius, and compared to strong gravitational lensing, results 
in small magnifications and small image distortions, which makes it often im¬ 
possible to detect it without a priori knowledge of the source properties. While 
strongly lensed images often can tell us about the structure of the lens, weak 
gravitational lensing images allow us only to probe the statistical properties of 
the matter distribution on the line-of-sight. Nevertheless, weak lensing is one of 
the most common effects observed in the Universe. At some level, all objects 
that emit light and are observed at Earth are affected. 


5.3 SNe through gravitational telescopes 


Zwicky had already proposed by the 1930s to use galaxies as gravitational 
lenses. However, it was not until the late 1970s that the first gravitationally 
lensed objects were detected. Erom the very beginning, SNe have been taken 
into account in the calculation of the lensed images as being one of the back¬ 
ground sources (Refsdal 1964). Since that time several applications of lensed 
SNe have been explored. 
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Unfortunately, no multiply imaged SN has yet been observed. Nevertheless, 
magnification/demagnification by large-scale structure along the line-of-sight 
can be used to estimate weak lensing effects. 

This effect is a statistical “nuisance” in obtaining cosmological inferences 
from SNIa. There are two ways to correct for it: (i) assume some overall mag¬ 
nification distribution (e.g. Holz & Linder 2005; Martel & Premadi 2008) or 
(ii) calculate convergence along each SN line-of-sight using simplified scaling 
relations applied to the nearby foreground galaxies observed (e.g. Jdnsson et 
al. 2008; Jdnsson et al. 2010b; Kronborg et al. 2010; Kostrzewa-Rutkowska, 
Wyrzykowski, & Jaroszynski 2013; Smith et al. 2014). The last method can 
also be used to estimate parameters of the objects on the line-of-sight, such as 
dark matter halo parameters, which I have done in Paper II and Section 6.2. 

In recent wide-field surveys, a few SNe have been found that are detectably 
magnified. Among them are three gravitationally lensed SNe (SN CL012Car, 
SN CLN12Did, and SN CLAl ITib) behind CLASH clusters (Patel et al. 2014). 
The reason for lensed SNe to be such rare events is that the SN has to be pre¬ 
cisely aligned with a gravitational lens. 
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Chapter 6 


Summary of my results 


‘In God we trast. All others must 
bring data.’ 

W. Edwards Deming 

My main research results are in the five papers included in this thesis, which 
describe different aspects of how observations of SNe can be used in astro¬ 
physics and cosmology. In particular, the papers focus on: the use of SNIa 
in constraining the background cosmological model describing our Universe; 
the use of gravitational lensing of distant SNIa by foreground cosmic structure 
to constrain the nature of dark matter haloes of galaxies; the determination from 
photometric lightcurves of whether or not a given SN is of type la, and hence 
can be used in cosmological inference; checking for consistency between dif¬ 
ferent SN data-sets within large compilations. In this chapter, I will summarise 
the results from my papers, give an update on the already published results and 
describe findings that are yet to be published. 

6.1 SNe and cosmology 

In Chapter 4, 1 have already presented a general account of different methods for 
cosmological inference from SNIa data. In Paper I, I present a detailed compar¬ 
ison of the standard -methodology and the recently proposed BHM applied to 
SNIa lightcurves fitted with the SALT2 technique. I described these two meth¬ 
ods in Section 4.4 and 4.5 respectively. Through the analysis of realistically 
simulated SN data-sets, I obtain similar results in Paper I to those obtained from 
the toy example described in Chapter 4. 
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Figure 6.1: Left (right) panel: sampling distributions derived from a ACDM 
analysis of 100 simulated SNLS (‘cosmology’) data-sets. The histograms show 
the point estimates for the SNla global parameters a, /3 and the cosmological 
parameters ^A,o> inferred using the BHM (filled blue histograms) and the 
chi-square method (empty green histogram). Blue and green vertical lines show 
the mean values of the point estimates, solid red vertical lines show the value of 
the true (i.e. model input) parameter used to simulate the data. From Paper 1. 


6.1.1 Biasses in the x^-method and BHM 


In Paper I, I establish that small biasses in the recovery of cosmological param¬ 
eters occur for both the standard x^-methodology and the BHM. These biasses 
are not the same for each method, however, leading to discrepancies between 
their results, which are greatest when analysing just a single survey, such as 
SNLS; see the left panel in Figure 6.1. I find fhaf fhe BHM offers a modesf 
advanfage over fhe x^-mefhod in fhaf if produces slighfly less biassed estimates 
of the parameters; this is particularly true for Hm,o- The biasses on the Hm,o 
estimates produced by the two methods are in opposite directions, which results 
in approximately a 2a discrepancy for any given realisation of SNLS-type data. 
Most interestingly, I find fhis fo be fhe case for fhe real SNLS dafa-sef. How¬ 
ever, in simulafions, one finds fhaf increasing fhe redshiff range of fhe dafa-sef 
reduces fhe discrepancy befween fhe mefhods; see fhe righf panel of Figure 6.1. 
As more higher and lower redshiff SNIa are added fo fhe sample, fhe esfimafes 
of fhe cosmological paramefers of inferesf from fhe fwo mefhods begin fo con¬ 
verge. Nonefheless, fhis conclusion may be premafure; allhough we generaled 
sfafe-of-lhe-arl simulations, fhey confained no redshiff dependence of fhe SN 
properties. 
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Data-set 


Union2 

l0g{ZH22) 

logf^Hia) 

logf^Hia) 

Sll 

l0g{ZH22) 

logiZHia) 

ESSENCE 

9.21 

10.18 

8.57 

— 

— 

— 

HST 

0.80 

-0.16 

0.23 

-0.30 

-0.63 

-1.07 

SDSS 

-0.25 

-0.77 

-1.13 

-0.33 

-0.61 

-1.27 

SNLS 

2.08 

1.34 

1.41 

4.87 

3.62 

3.68 

CfA 

0.00 

-0.20 

-0.95 

-0.01 

-0.32 

-1.02 

how-z 

— 

— 

— 

0.01 

-0.30 

-0.97 


Table 6.1: log-evidence values for models Hn^np (na and rijs being the num¬ 
ber of redshift bins for stretch and colour parameter populations respectively) 
relative to the model Hu. Errors on these log-evidence values are all around 
0.09. 


6.1.2 BHM with redshift-dependent stretch and colour 
corrections 

As mentioned in Chapter 4, it is possible for the stretch and colour parameter 
populations to evolve with redshift; indeed some studies have already tried to 
explore this possibility (Kessler et al. 2009; Conley et al. 2011). Kessler et al. 
(2009), in their Section 10.2.3, present evidence for the redshift evolution of 
the color parameter /3 for the SALT2 lightcurve fitting algorithm for different 
combinations of samples in the full data-set. This question has been revisited in 
Section 5.7 of Conley et al. (2011); they find fhaf using lafer versions of SALT2 
resulfs jusf in marginal evidence for fhe evolution of fhe paramefer (3, buf fhey 
do nol discuss how fhis changes wifh differenl combinafions of dafa-sefs. I 
revisif fhis quesfion by applying fhe BHM fo fhe dafa from Union2 and Sullivan 
el al. (2011) (hereafter Sll). I divide fhese SNe according lo fhe telescope wifh 
which fhey have been observed; in parlicular I divide fhe SNe info fhe following 
subsels: ESSENCE, HST, SDSS, SNES, CfA and a compilation of low-z SNIa 
measuremenls. 

In order fo check whefher fhese dafa-sefs have redshiff evolution in fhe 
sfrefch and colour parameters, I modify fhe BHM lo allow for fhis evolufion by 
inlroducing mulliple configuous redshiff bins for fhe sfrefch and colour param- 
eler populations. SNe wilhin differenl sfrefch (colour) redshiff bins are allowed 
lo have differenl values of a (/3) and Rx (Re)- This is achieved by allowing fhe 
priors on a (/3) and Rx (Re) to be completely independenl in differenl sfrefch 
(colour) redshiff bins. The lower (upper) limils on firsl (Iasi) redshiff bin are 
sel lo Zmin (-^max). where Zjxiin and Zmax are fhe minimum and maximum SN 
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redshifts in the catalogue. The other end points of redshift bins are set as free 
parameters which are estimated along with other BHM parameters. The number 
of stretch and colour redshift bins, represented by ria and np, are estimated by 
Bayesian model selection, done by analysing models with different values of Ua 
and nj 3 , starting with no = = 1, and picking the model with the highest 

value for the Bayesian evidence. We denote these models by Hn^np- 

The log-evidence values for models H 12 , H 22 and Tfi 3 , all with respect to 
the base model Hu, are given in Table 6.1. It is evident from this table that 
Hi 2 (ria = I, rijs = 2) is the preferred model for most data-sets. Even when 
Hi 2 is not the most preferred model, preference for other models over it is not 
very strong. Thus, in agreement with the previous studies mentioned above, we 
see that our Bayesian model selection approach also provides evidence for some 
evolution with redshift of the /3 parameter. This is an interesting finding in its 
own right, but also has important consequences for using the standard BHM, 
which assumes no redshift dependence for any of the parameters. Clearly, this 
assumption is broken by the real data, and so one must take care in interpret¬ 
ing results obtained using the “vanilla” BHM. Although the modification to the 
BHM introduced above has its uses, a more statistically-principled approach to 
extending the BHM is to allow the priors on the colour and stretch parameters 
to depend on redshift. This generalised Bayesian likelihood method is discussed 
in Section 4.6. 


6.2 SNe and cosmic structure 

In Paper II, I present a Bayesian statistical methodology for constraining the 
properties of dark matter haloes of foreground galaxies that intersect the lines- 
of-sight towards SNIa. This builds on the BHM used in Paper I. 

In this approach, the parameters of interest are those describing the dark 
matter haloes assumed to exist around the known galaxies along the lines-of- 
sight to the SNIa. My method yields an effective likelihood function, which 
gives the probability of obtaining the observed SNIa data (i.e. the parameter 
values obtained in SALT2 lightcurve fits) as a function of these parameters. 
Once appropriate priors have been placed on the parameters, the full posterior 
distribution is explored using MultiNest to obtain parameter constraints and 
also calculate the Bayesian evidence for use in model comparison. 

I investigate two different models for the density profile p{r) of fhe dark 
maffer halo: fhe fruncafed singular isofhermal sphere (ISIS) and fhe Navarro- 
Frenk-Whife (NEW) profile (Navarro, Frenk, & While 1997), bofh of which are 
widely used models in asfronomy. My resulfs for fhe SIS profile are presenfed in 
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Figure 6.2: Left (right) panel: ID and 2D marginalised posteriors distributions 
for the parameters of the tSIS halo model, derived from the analysis of 500 
simulated SNIa generated from a tSIS model (real SNLS data). From Paper II. 


Paper II. The NFW results were also included in the first version of Paper II, but 
were subsequently removed during the refereeing process, since it was necessary 
to add significant further discussion of the methodology, which resulted in the 
paper becoming too long. I summarise below the main finding for both the SIS 
and NFW profiles; note that the full version of the original paper is still online 
as the 1st arXiv version of Paper II. 


6.2.1 SIS model 


The SIS model has a radial density profile given by 


p(r) 


1 

2ttG ’ 


(6.1) 


which is a funcfion of just one single free parameter, namely the one-dimensional 
velocity dispersion a of its constituent particles. A disadvantage of the SIS pro¬ 
file is fhat the total mass diverges. Consequently, I use a modified version that 
depends on a second free parameter rt, which defines the radius at which the 
SIS profile is truncated. For finite rt, the total mass does not diverge. 
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It is straightforward to show that for an object described by a tSIS profile, 
the surface density is 



( 6 . 2 ) 


By substituting this expression into Eq. 5.6, one obtains the corresponding con¬ 
vergence Kgai(0 produced by a galaxy dark matter halo of this form. 

An important complication that arises is the potential relationship between 
the velocity dispersion and the galaxy luminosity. To allow for this possibility 
and to determine the form of the relationship, I assume the scaling relations: 



(6.3) 



(6.4) 


Thus, for the tSIS halo model, we wish to constrain the four parameters h = 
{7,77,o-*,r4. 

To evaluate my methodology, I first apply it to simulated SNIa data-sets. To 
this end, I generate and analyse multiple sets of simulations, each the same size 
as the real SNLS data-set. My main finding was fhaf fhere is a wide variation 
in fhe significance af which one may defecf a gravifafional lensing signal. This 
resulfs from fhe strong dependency of the gravitational lensing signal on whether 
the data sample contains some SNIa that are strongly magnified. The number 
of such SNIa in fhe sample has a marked effecl on fhe derived log-evidence 
A log Z relafive fo a model assuming no lensing, which I find ranges from about 
— 1.5 to 4.5, with a median value of —0.6. The parameter constraints derived 
from this median catalogue are very broad; indeed the constraints are similar 
to those obtained from a simulation containing no lensing signal. Nonetheless, 
as shown in the left panel of Figure 6.2, if one increases the number of SNe in 
the data-set up to 500, the constraints become much tighter and one is able to 
set limits on the parameters of the dark matter haloes. Indeed, these constraints 
contain the true values input to the simulations. Thus, provided the sample of 
SNe is sufficiently large, my method is able to detect the gravitational lensing 
signal and recover the correct halo parameters. 

When applied to real SNLS data (consisting of 162 SNIa), the parameters 
constraints are those shown in the right panel of Figure 6.2. On performing a 
Bayesian model comparison, I find thaf fhe model for a lensing signal produced 
by fSIS haloes is only jusf preferred by 0.2 log-evidence unifs relafive fo fhe 
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no lensing model, which is similar to the uncertainty in the evaluated evidence. 
Consequently, there is no support for choosing either the lensing or no lensing 
models. This marginal detection is contrary to previous studies, although these 
earlier works did not perform Bayesian model selection, but instead focussed on 
goodness-of-fit statistics at the best-fit point in parameter space. One can begin 
to reconcile these findings by noting that the parameter constraints for the tSIS 
halo model (see the right panel of Figure 6.2) do appear somewhat tighter than 
those obtained for simulations of 162 SNIa without the inclusion of a lensing 
signal. This does suggest a borderline detection of a lensing signal in the real 
SNLS data. 


6.2.2 NFW model 


The NFW profile has a radial density distribution given by 


p{r) 


Pc 

(r/rs)(l + r/rs)2 ’ 


(6.5) 


where pc = 3H‘^{z)/87rG and H (z) are the critical density and Hubble param¬ 
eter, respectively, at the redshift, z, of the halo. The scale radius Vg = r 2 oo/c is 
a characteristic radius for the halo, where r 2 oo (the virial radius) is the radius at 
which the mass density of the halo drops to 200pc> the dimensionless number c 
is the concentration parameter, and 


200_ c^ _ 

3 ln(l -h c) — c(l-I-c)“t 


( 6 . 6 ) 


is a characteristic overdensity. The profile therefore depends on two free param¬ 
eters: the virial radius r 2 oo and the concentration parameter c. 

The mass contained within the virial radius r 2 oo is 

.r \ SOOvr o SOOtt Pm o 

M 200 ^M{r < r2oo) = —^Pcr^oo = (6-7) 

where pm is the mean matter density of the universe and flm is the matter density 
parameter, both evaluated at the redshift z of the halo. 

To evaluate the gravitational lensing effect of an NFW halo, one must cal¬ 
culate its surface density, which is given by (Bartelmann 1996) 


m 





arctanh 


l—x 

1+x 


< 2rsScpc 
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arctan 



(x < 1) 

{x = 1) 

(x > 1) 


( 6 . 8 ) 
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Figure 6.3: ID and 2D marginalised posteriors distributions for the parameters 
h = {A, c, t^qq} of the NFW halo model, derived from the analysis of 162 
simulated SNIa data generated assuming no lensing (right) and a NFW halo 
model (left). In the left-hand panel, true parameters are indicated by vertical 
lines and crosses in ID and 2D plots, respectively. 




Figure 6.4: As in Figure 6.2, but for h = {A, c, r^gQ} of the NFW model. 
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Simulation model 


Analysis model 


No lensing tSIS NFW 


No lensing 


0.0 0.0 0.0 

-1.8 -1.1 -1.3 

-3.1 -2.6 -2.3 


tSIS 

NFW 


Table 6.2: AlogZ (log-evidence value relative to the null evidence) for the 
analysis of simulated data with 162 SNIa, with errors of 0.2. 


where x = is a dimensionless projected radial distance in the lens plane. 
The resulting convergence Kgai(0 due to a NFW halo is then obtained by sub¬ 
stituting S(,f) into Eq. 5.6. 

A complication arises similar to that encountered for the tSIS profile, namely 
that there is potentially a relationship between the virial radius of a NFW halo 
and the luminosity of the galaxy it surrounds. To allow for and investigate this 
possibility, I therefore adopt the scaling law 



(6.9) 


In this case, one thus seeks to constrain the three parameters h = {A, c, r^oo}- 
I first apply my method to simulated data-sets of the same size (162 SNIa) 
as the real SNLS data, as was done for tSIS model. Figure 6.3 shows that 
one cannot obtain any real constraints on the NFW halo parameters, since the 
marginalised posteriors look very similar to those obtained from simulations 
containing no lensing signal. Increasing the number of SNe in the simulations 
to 500, I find that my method does produce constraints on the halo parameters 
that are consistent with the input values used in the simulations; this is illustrated 
in the left panel of Figure 6.4. As shown in the right panel, however, the analysis 
of the real SNLS data yields no useful constraints on the halo parameters. 

6.2.3 Model selection between different dark matter haio 
modeis 

To understand better the marginal detection (at best) of any lensing signal in the 
real SNLS data, I now perform a systematic Bayesian model comparison us¬ 
ing my simulated data-sets. I begin by considering simulations of the same 
size (each containing 162 SNIa) as the real SNLS data. Table 6.2 lists the 
Bayesian log-evidence for each analysis model, relative in each case to the null 
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Analysis model 

Simulation model 

No lensing ISIS NFW 

No lensing 

0.0 

0.0 

0.0 

ISIS 

-1.7 

4.5* 

6.9 

NFW 

-3.1 

3.6 

7.2* 


Table 6.3: AlogZ (log-evidence value relative to the null evidence) for the 
analysis of simulated data with 500 SNIa, with errors of 0.2. Asterisks denote 
the cases for which the corresponding halo parameter constraints are plotted in 
the left panels of Figs. 6.2 and 6.4, respectively. 


Model 

AlogZ 

ISIS 

0 .2* 

NFW 

-2.5* 


Table 6.4: A log Z (log-evidence value relative to the null evidence for no lens- 
ing signal) for the analysis of the real SNIa data, with errors of 0.2. Asterisks 
denote the cases for which the corresponding halo parameters constraints are 
plotted in the right panels of Figs 6.2 and 6.4, respectively. 


(no-lensing) model. In each case, the no lensing model is preferred. This con¬ 
curs with my findings for the real SNLS data, and suggests that one cannot de¬ 
tect a lensing signal with data of this quantity and quality, let alone distinguish 
between different halo models. 

To determine the nature of the data required to obtain a robust detection of 
lensing, I analyse simulations each containing 500 SNIa. The results are given 
in Table 6.3. For simulations containing no lensing signal, the method correctly 
identifies fhis as fhe preferred model. More imporfanfly, however, my mefhod 
also prefers fhe models wifh lensing (af high significance) for simulafions fhaf do 
confain a lensing signal. This demonsfrafes fhaf fhe mefhod performs correcfly. 
Moreover, fhe correcf halo model is also picked ouf, allhough fhe selection be- 
fween halo models is nol robusl as fhe log-evidence differences are quife small. 

Finally, Table 6.4 shows fhe resulfs using real dafa. As menfioned above, 
for fhe ISIS halo model, fhere is a very slighf preference for a lensing signal, 
buf only by 0.2 log-evidence unils, which is fhe level of fhe uncerfainfy in fhe 
calculation of fhe evidence. For fhe NFW halo model, however, fhe presence of 
a lensing signal is sfrongly disfavoured by —2.5 log-evidence unils relafive lo 
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Figure 6.5: Left: magnification of 10^ simulated SNIa. Right: magnification for 
the 75 lines-of-sight that exhibit the strongest lensing effect. From Paper II. 


the no lensing model. 

6.2.4 Foreground galaxies catalogue 

In all the analyses described in this section, I have used the true galaxies from 
the SNLS catalogue. To understand my results further, it is of interest to examine 
the magnification along a large number of lines-of-sight through these galaxies. 

In Figure 6.5 (right panel), I plot the magnification factor for each of 10^ 
simulated SNIa. From the plot, one can see that the background correction re¬ 
sults in most of the SNIa being demagnified. Nonetheless, there are a small num¬ 
ber of SNIa that have very large magnifications. This suggests that one should 
expect a large variation in the strength of the lensing signal when analysing SNIa 
catalogues containing relatively few events, such as the SNLS data-set. The im¬ 
portant criterion is whether the SNIa catalogue contains one or more SNIa that 
are very strongly magnified. As one observes more SNIa, one would expecf fhe 
variation befween randomly consfrucfed cafalogues fo diminish, and a more sfa- 
ble and robusf defecfion of lensing fo be possible. This agrees wifh my resulfs 
given above. 

Anofher imporfanf observation is fhaf, for sfrongly magnified SNIa, fhere 
is no clear correlafion befween fhe size of fhe magnification and redshiff. This 
suggesfs fhe counfer-infuifive conclusion fhaf observing high-redshifl SNIa may 
nol confer any advanfage in affempfing fo defecf a lensing signal. I examine 
fhis issue furfher by plotting in Figure 6.5 (righf panel) fhe magnification as a 
function of redshiff along fhe 75 lines-of-sighl fhaf exhibif fhe highesf magnifi- 
cafion. One firsl nofices fhaf fhe magnificafion of fhe fhree mosf highly lensed 
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lines-of-sight continues to increase markedly up to z = 1. Nonetheless, for the 
remaining lines-of-sight, the magnification does not typically increase much be¬ 
yond z ~ 0.5. Again this suggests that, at least along these lines-of-sight, there 
is little advantage in observing a very high-redshift SNIa in terms of detecting a 
leasing signal. 

One must be careful in drawing such conclusions, however, because I am 
using the true galaxy catalogue from SNLS. This real catalogue will inevitably 
suffer from selection effects that result in high-redshift galaxies being under¬ 
represented. This is expected to give the above effects. One may investigate this 
issue further by performing the same analysis for galaxy catalogues constructed 
taken from some large numerical simulation, and this would be an interesting 
topic for future research. Moreover, forthcoming surveys such as DBS and Eu¬ 
clid will enable us to take a significant step forward, since they will provide not 
only SNIa data, but also very good measurements of foreground galaxies. 


6.3 SNe photometric classification 


In Section 3.2, I started a discussion about methods that can perform photo¬ 
metric classification of SNe into la and non-la types. Since such methods are 
becoming increasingly necessary for the analysis of large SNIa data-sets, I also 
worked on developing fully automated methods that can perform this task in 
a quick, automated and robust manner. In Paper III, I present a new method 
for performing automated photometric classifications of SNe into la and non-la 
types. This method adopts an extremely naive approach to the question and I 
do not use any prior information about SNe physics. Thus in Paper V, I use a 
HNN to include information about SN models. Both of these methods take a 
two-stage approach. First, the SN lightcurves are fitted to an analytic parame- 
terised function in order to standardise the number of variables associated with 
each SN. The resulting fitted parameters, together with a few further quantities 
associated with the fit, are then used as the input feature vector to a classification 
NN and a hierarchy of classification NNs whose output is the probability that 
the SN is of a particular type. 


6.3.1 First step: Lightcurve fitting 

The form of the fitted function is given by 




e 


(t —to)/Tfall 




( 6 . 10 ) 
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Figure 6.6: Simulated lighteurve measurements and associated uncertainties 
(red points) in the g, r, i and 2 ; filters for a Type-la (top row) and non-la (bottom 
row) SN, together with the best-fit function (green line). From Paper III. 


where, for each SN, t = 0 corresponds to the time of the earliest measurement 
in the r-band lighteurve. Figure 6.6 shows the best-fitting functional form in 
four wavebands for a typical la and non-la SN. 

For each fitted lighteurve, I construct a feature vector that contains the mean 
values of the one-dimensional marginalised posteriors of each parameter in the 
fitting formula of Eq. 6.10, 0 = {A, fi, to; T’rise,) Tfaii,} and their standard 
deviations cr = { 17 ^ 4 , cjs, ajj, fTtg, ciTrise) I also append to the feature 

vector the number of flux measuremenfs n in fhe lighfcurve, fhe maximum- 
likelihood value of fhe hi and fhe Bayesian evidence of fhe model. This feafure 
vecfor fhen provides a sfandardised inpuf for fhe fraining of fhe NN. 


6.3.2 Second step: NN classification 

In Secfion 2.3, 1 described fhe 3-layer feed-forward NNs fhaf I use in my work. 
In parficular, I use fhe SkyNef package. 

In fhe applicafion fo SN classificafion, if is imporfanf fo assess fhe qualify of 
fhe nefwork oufpuf classes by consfrucfing some sfafisfical measures. The mosf 
appropriafe quantifies are fhe complefeness eia (fracfion of all SNIa fhaf have 
been correcfly classified; also often called fhe efficiency), purify rja (fracfion of 
all Type la candidafes fhaf have been classified correcfly) and figure of merif Tis, 
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Figure 6.7: Completeness, purity and figure of merit for SNIa classification as a 
function of redshift (z) from applying trained networks with a threshold proba¬ 
bility (pth) of 0.5; “with z” and “without z” indicates that redshift information 
was/was not used in network training. Samples Vi and V 4 use 10 and 40 per 
cent of the data, respectively, for training. From Paper III. 


for SNIa. These are defined as follows: 
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( 6 . 11 ) 

( 6 . 12 ) 

(6.13) 


where is fhe fofal number of SNIa in fhe sample, is fhe number of 

SNe correcfly predicfed fo be of Type la, is fhe number of SNe incorrecfly 
predicfed fo be of Type la and VF is a penally facfor which conlrols fhe relalive 
penally for false posifives over false negatives. 


6.3.3 SuperNova Photometric Classification Challenge 
(SNPCC) 

I slarled my work on developing melhods for pholomelric classification of SNe 
by applying if fo fhe updated simulaled dafa-sel released following fhe SNPCC 
(Kessler et al. 2010a,b). I did nof apply any cufs fo fhe original dafa, so my dala- 
sef confained low signal-lo-noise SNe and very poorly-sampled lighlcurves, 
sometimes conlaining very few measured fluxes lhal are nof necessarily mea¬ 
sured on bolh sides of peak brighlness. 
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Figure 6 . 8 : Completeness, purity and FoM as a function of threshold probability 
Pth from applying trained networks to get la/non-Ia classification probabilities 
on the testing data-set from NN (Paper III) and HNN. No redshift information 
was used in network training. Samples P 2 and P 5 use 20 and 50 per cent of the 
data, respectively, for training. From Paper V 


NN 

In Paper III, I found that applying a regular classification network, I obtain very 
robust classification results, namely a completeness of 0.78 (0.82), purity of 
0.77 (0.82), and SNPCC figure-of-merit of 0.41 (0.50) when I use 10 (40) per 
cent of the data for training and assume a canonical threshold output probability 
Pth = 0.5. I pick my training set randomly and do not use any redshift infor¬ 
mation. A modest 5-10 per cent improvement in these results is achieved by 
also including the SN host-galaxy redshift and its uncertainty as inputs to the 
classification network, see Figure 6.7. The quality of the classification does not 
depend strongly on the SN redshift. 


HNN 

In Paper V, I further develop the method presented in Paper III by introducing 
a HNN, which accommodates the structure shown on Figure 3.2. From Fig¬ 
ure 6.8 we see that HNN performs better than the method in Paper III as more 
training data becomes available. Even with a small amount of data, however, 
HNN still performs well. These positive preliminary results motivates further 
study, in particular the investigation of the importance of the training sample: is 
it percentage or total number of SNe used for training that makes the largest dif¬ 
ference? Also I want to test this method on a more realistic sample, for example 
when measurements for some filters are missing completely. 
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6.4 Testing consistency 

In Section 3.3.3, I described the benefits of using a joint analysis of SNe from 
different telescopes/surveys, having SNe both at low and high redshift. A large 
collection of SNe over a wide range of redshifts results in tighter constraints on 
cosmological parameters. This comes at a price, however, since one must first 
check that the individual SN surveys produce results that are mutually consis¬ 
tent. If this is not the case, any results derived from their combination may be 
misleading. Such checks for mutual consistency are rarely performed. In Sec¬ 
tions 2.1.1 and 2.1.2 I discussed Bayesian methods to perform such a test. In 
this Section, I apply this method to SN data. 


6.4.1 Consistency test based on x^-method 

In Paper IV, I test the mutual consistency of different SN surveys within JLA 
(Betoule et al. 2014) andUnion2 compilations. The same way as in Section 6.1.2 
I separate each compilation into subsets according to the telescope/survey with 
which they have been observed. Since the x^-method uses a non-normalised 
likelihood one replaces Pr(D|0,Tf) by the “likelihood” £(0,(Tint) discussed 
in Section 4.4. In this case, however, one can no longer interpret the terms 
in Eq. 2.5 directly as probabilities. Consequently, the value of the TZ cannot 
be compared with the normal Jeffreys’ scale. One still expects, however, that 
for data-sets that are mutually consistent the 7^-value will be higher than for 
inconsistent ones March et al. (201 lb). Consequently, one may still use the TZ- 
value, but as a one-sided test statistic in the frequentist sense, which must be 
calibrated using simulations. 

The distribution of TZ under the null hypothesis Hq is constructed from sim¬ 
ulations in which the individual surveys are mutually consistent. The TZ value 
obtained by analysing the real data can then be compared with this distribution 
in the standard manner. In particular, we calculate the p-value as: 


P = 


NiTZs < TZr) 


Ntot 


(6.14) 


where TZg and TZj^ are the TZ values obtained by analysing simulated and real 
data-sets respectively, N {TZs < T^r) is the number of simulations with TZ values 
less than that obtained by analysing the real data and Atot is the total number of 
simulations. 

My key finding is fhaf fhe multipliers a and /3 of fhe sfrefch and colour cor- 
recfions, respecfively, are significanfly differenf for fhe Union2 cafalogue (see 
fhe leff panel of Figure 6.9). By confrasf, fhe JLA cafalogue shows no such 
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Figure 6.9: Left panel: Results of the consistency test applied to survey pairs in 
the Union2 compilation. The blue histograms show the distribution of 7^-values 
obtained from 10^ consistent simulations of each pair, and the red vertical line 
indicates the 72.-value obtained from the real data. The corresponding one-sided 
p-value is given above each panel. Right panel: Two-dimensional marginalised 
constraints on the parameters (a, /?) obtained from the individual constituent 
surveys contained in the Union2 catalogue. The red (blue) contours denote the 
68 and 95 per cent confidence regions for the survey in that row (column). From 
Paper IV. 


inconsistency. Interestingly, both catalogues show no inconsistency in the con¬ 
straints derived on cosmological parameters. Nonetheless, the inconsistency 
discovered for the Union2 catalogue means that one must be careful interpreting 
the results obtained from a joint analysis of it. The results of the consistency test 
for the Union2 compilation are shown in the left panel of Figure 6.9, together 
with the corresponding p-values. 

6.4.2 Consistency test based on BHM 

In contrast to the -method, using BHM allows one to use the 7^-test directly, 
as described in Section 2.1.1. Since I use here the same data as in Section 6.1.2, 
I will use BHM with two bins for colour parameters as the default hereafter. 

In Tables 6.5 and 6.6, 1 summarise the results for pairwise consistency checks 
of the Sll and Union2 data-sets respectively. I should recall here that posi¬ 
tive (negative) values of log(72.) give evidence in favour of consistency (incon¬ 
sistency) between data-sets with the level of consistency (inconsistency) inter- 
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HST 

SDSS 

SNES 

CfA 

SDSS 

3.97 

— 

— 

— 

SNES 

1.50 

8.33 

— 

— 

Eow-z 

3.24 

8.28 

-3.50 

— 

CfA 

3.46 

8.09 

1.54 

6.13 


Table 6.5: log(72)-values for pairwise consistency tests of data-sets in the Sll 
catalogue. Errors on these log(72)-values are all around 0.18. 



ESSENCE 

HST 

SDSS 

CfA 

HST 

10.62 

— 

— 

— 

SDSS 

-16.42 

6.19 

— 

— 

SNES 

4.84 

8.01 

13.47 

— 

CfA 

7.47 

6.21 

-12.17 

-0.42 


Table 6.6: log(72)-values for pairwise consistency checks of data-sets in the 
Union2 catalogue. Errors on these log(72)-values are all around 0.18. 


preted according to Jeffreys’ scale given in Table 2.1. Erom Tables 6.5 and 6.6, 
we see that, as for the consistency check based on the -method, some data-sets 
are inconsistent within the Union2 catalogue. 

The level of inconsistency between different SN data-sets in the Union2 cat¬ 
alogue again means that one should be careful interpreting the results from joint 
analysis performed using this compilation. One should note, however, that the 
inconsistent pairings derived in the BHM analyses are different to those found 
using the x^-method. This is not surprising, as the two methods are very differ¬ 
ently affected by redshift dependence of SN properties, as discussed in Chap¬ 
ter 4, and the BHM method used here is explicitly allowing for such an effect. 
Nonetheless, in both the BHM and x^-method, the inconsistencies found are 
related to the /3 parameter associated with the colour correction. 

6.4.3 Hyper-parameters 

Here I present results from analysing the SN data-sets from Section 6.1.2 using 
the hyper-parameter approach described in Section 2.1.2. 

In my work I will introduce the hyper-parameters ji, one for each data-set, 
by modifying the covariance matrix Ci of data-set Di to become Ci / y/fi. This 
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Data-set 

Union2 

ESSENCE 

0.49 ±0.11 

HST 

0.65 ±0.18 

SDSS 

0.78 ±0.22 

SNLS 

0.46 ±0.09 

CfA 

0.17 ±0.06 


Table 6.7: Estimated values of the hyper-parameters 7 * for the Union2 catalogue. 


explicitly allows for the possibility that the quoted measurement uncertainties 
are over or under-estimated, but may be considered more generally as a weight¬ 
ing of each data-set. As discussed in Hobson, Bridle, & Lahav (2002), the prior 
distribution on the hyper-parameters 7 * is exponential with expectation value 
unity. This follows because one expects a priori the quoted measurement un¬ 
certainties from each data-set to be neither over- nor under-estimated. With this 
constraint, and the requirement that the weights are non-negative, the correct 
prior distribution according to the maximum-entropy principle is the exponen¬ 
tial prior (see e.g. Hobson, Bridle, & Lahav 2002). 

First I perform model selection between the two hypotheses. 

Hq : The combined data-set can be described without hyper-parameters 7 *, i.e. 
the data-sets are all consistent with each other and measurement uncer¬ 
tainties are neither over- nor under-estimated. 

Hi : The combined data-set requires the hyper-parameters 7 *, in order to deal 
with inconsistencies between data-sets and/or inaccuracies in the mea¬ 
surement uncertainties. 

For the Sll and Union2 catalogues, log{ZHQ /is found to be 2.30 ± 
0.17 and —8.54 zh 0.18, respectively. These values point towards inconsis¬ 
tency between different data-sets in the Union2 catalogue which is in perfect 
agreement with our findings in the last section using the 77.-test. The Sll cata¬ 
logues did not show any inconsistency between different data-sets according to 
the 77 ,-test, which is reinforced by their preference for a model without hyper¬ 
parameters 7 i. 

Estimated values of hyper-parameters 7 * are given in Table 6.7. In the ab¬ 
sence of any inconsistencies or measurement inaccuracies, one should expect 
7 j ~ 1 , therefore any deviation from unity provides evidence in favour of some 
unaccounted systematics in the data-set. It can be seen from Table 6.7 that in the 
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Figure 6.10: ID and 2D marginalised posterior distributions for the matter den¬ 
sity Dm,o and Hubble parameter h when the Union2 catalogue is analysed with 
(red) and without (black) hyper-parameters 7 j. 


Union2 catalogue, ESSENCE, SNES and CfA, all have 7 * more than Su away 
from unity. In order to show the effect of these systematics on cosmological pa¬ 
rameter inferences, we plot the marginalised posterior distributions for the mat¬ 
ter density rim.o and Hubble parameter h when the Union2 catalogue is analysed 
with and without hyper-parameters 7 ^. Introduction of hyper-parameters results 
in a 15% increase in the estimated value of Dm o> as shown in Eigure 6.10. Hence 
it is important to include these effects. 







Chapter 7 


Outlook 


‘Let the past look after itself, and 
let the present move forwards into 
the future’ 

Douglas Adams 


7.1 Future problems for SN astronomy 

SN data helped observational cosmology to make a great step forward over the 
last two decades. The discovery of the accelerated expansion of the Universe 
undoubtedly is one of the most important breakthroughs over the past 20 years. 
Unfortunately, the nature of it still remains unknown. Nevertheless, future SN 
surveys certainly will help our understanding of this great mystery of 21st cen¬ 
tury physics. Already today, SN data are good enough to perform model selec¬ 
tion between different cosmological scenarios. Future SN surveys will address 
all these problems, not only by increasing the number of observed SNe, thereby 
improving the statistical errors, but also by focussing on systematics. The prob¬ 
lems discussed in Chapter 4 regarding rigorous inference methods will be more 
important than ever, which will inevitably lead to the development of more ro¬ 
bust methods for cosmological parameter inference. 

Another big question in SN studies is the nature of the SN explosion itself. 
Standard models for SNIa explosions, such as a carbon-oxygen white dwarf 
burning and the merger of two white dwarfs, are increasingly questioned for 
their universality. On the other hand, the nature of core-collapse SNe is an even 
bigger puzzle, resulting in our inability even to write a parametric function for 
the non-la lightcurves. The solution to this problem does not relate to cosmology 


85 



86 


Chapter 7. Outlook 


per se, but will inevitably have a huge influence on cosmological parameter 
inference. Knowing an explosion mechanisms will play a crucial role in a better 
standardisation of SNe. 

Observing SNe in multiple filters proved to be key to their standardisation. 
Future surveys will increase the number of filters, which will not only help to 
better standardise events, but will open a window for spectroscopy-free SNe 
cosmology. Having measurements of lightcurves in multiple filters in the future 
will allow us to make classification based only on photometry. Already, some 
photometric classification methods seems to be very competitive. Improving 
these methods will be one of the main objectives for future photometric surveys. 
For example, as was mentioned before, knowing the exact explosion mechanism 
for different types of SNe will be a great advantage, since having a parametric 
function for all SN types will allow us to make model selection between different 
SN types, as opposed to performing the comparison using template methods. 

In discussing spectroscopy-free cosmology, however, it is important not to 
belittle the importance of high-quality spectroscopic data. Such data is a cor¬ 
nerstone of SN cosmology. New experiments, such as iPTF and ZTF, will have 
on-line spectroscopic follow-up for all detected SNe. Having high-resolution 
spectroscopic data at the epochs as close as possible to the explosion will give 
us invaluable information to study the explosion mechanisms. Having a large 
data-set of spectroscopically-classified SNe is also of a great importance for de¬ 
veloping the photometric classification algorithms mentioned above. 

Another interesting path for SN cosmology is to try to standardise non-la 
SNe. As I discussed in Section 3.5, SNII-P have proved to be viable candidates 
for making cosmological inferences. With a better understanding of “standard” 
SN types and with “modern” SNe, such as SLSNe, or types yet to be discovered, 
we can try in the future to make a cosmological inference based on more than 
one type, or even using all of them. 

One might summarise the current status of SN cosmology by saying that, 
although it blossomed in 1998, the fruits are only now becoming ripe. 


7.2 My interest for the coming years 

The Bayesian methods I developed during my PhD years are ideally suited for 
investigating the problems discussed in the previous section. Since the statistical 
methods I am working with are very general in nature, I would like to broaden 
my research interests into other neighbouring topics, such as the CMB and large- 
scale structures, as well as more areas in gravitational lensing. In particular, I 
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wish to apply my experience with Bayesian parameter estimation and model se¬ 
lection to these areas. Building on my work with SNIa, one area of considerable 
interest would he to apply my Bayesian methods for testing the consistency of 
data-sets to a wider range of cosmological probes. There are already hints from 
Planck and SNIa data that the two data-sets disagree on the value of the Hubble 
parameter. This could be a real cosmological effect, or the result of undiagnosed 
systematics. Also, as cosmological data-sets improve, it will become possible 
to distinguish the standard concordance cosmology from alternative models at 
greater significance. As the signal-to-noise ratio of data improves, one (some¬ 
what ironically) has to be more careful in performing statistical analyses, as 
systematic effects begin to emerge from the noise. My Bayesian methods are 
again ideally suited to investigate such problems. 

Ai the final words of my “The supernova cosmology cookbook”, I would 
like to quote my favorite physicist, Lev Davidovich Landau: “Cosmologists are 
often in error, but never in doubt”. 

0UHaAt>HUMU cjioeaMU Moeil “IIoeapeHHOu khuzu o ceepxHoeux e koc- 
MOAOzuu: BauecoecKue hucachhuc pev,enmu” n xouy npoyuyupoeamt) 
Moezo ak)6umozo fiusuKa Jlhea ffaeudoeuua Jlanday “AcmpoHOMU uacmo 
oiuubammcA, ho nuKozda ne coMHeeammcn”. 
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Appendix A 


Princes cake 

A.1 Ingredients 

Cake: 

• 4 eggs 

• 2 dl sugar 

• 1 dl flour 

• 1 dl potato flour 

• 2 teaspoons baking powder 
Filling: 

• 4 sheets of gelatine 

• 2 dl vanilla sauee 

• 2 teaspoons vanilla sugar 

• 3 dl whipping eream 
Garnish: 

• 300 g marzipan 

• green and red food dye 

• ieing sugar 


89 



90 


Appendix A. Princes cake 


A.2 What to do 

Cake: 

1. Preheat the oven to 175 C. 

2. Grease and dust a round shape, about 2.5 liters. 

3. Beat eggs and sugar until fluffy. Mix the two flours and baking powder 
and fold into the batter and mix well. Pour into mold. Bake in the lower 
part of the oven for about 40 minutes. Turn the eake and let it eool. 

Filling: 

1. Add the gelatine leaves to eold water. Boil the mix of the vanilla sauee 
aeeording to paekage direetions. Remove the gelatine leaves and squeeze 
them well. Plaee them in the hot vanilla sauee. 

2. Whip the eream until thiek. Stir in the vanilla sauee when the eream starts 
to thieken. Let the eream beeome almost solid. 

Cut the eake into 3 layers (for 1 eake). The top should be slightly less than 1 
ineh thiek. Flatten the layers with filling between. Save a bit of it. Let the filling 
be slightly higher in the middle so the eake will be puffy when the uppermost 
thin layer is put on. Spread the rest of the eream on top and around the edge. 

Garnish: 

1. Colour about 70% of the marzipan green and the remaining marzipan red 
(it will beeome more pink than red). 

2. Roll the green marzipan into a round, thin, smooth sheet. It should be 
enough to eover the entire eake with an even thiekness. 

3. Cut out a eirele seetor with an area of about 30% of the area of the full 
sheet. 

4. Roll the pink marzipan to a sheet big enough to eover the pieee whieh was 
eut out from the green. 

5. Cut out a eirele segment with the same size as that whieh was removed 
before. 

6 . Cover the eake with the the green and pink marzipan pieees. 

7. Sift ieing sugar on top. 



A. 3. When to eat 
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As done in many Swedish bakeries, put on top a pretty pink marzipan rose. 
The best is to find a rose with the mass of approximately 5% of the total marzi¬ 
pan mass eomposing your eosmology eake. 

A.3 When to eat 

Prineess eake is perfeet for any oeeasion, but best (in the author’s opinion) with a 
eup of Swedish filtered eoffee on a eold winter day whieh lasts for four months. 
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